 Good morning or afternoon or evening wherever you are. Thank you for joining me at this deep dive today. I'll talk about materials code and data transparency and I am going to talk maybe take a little bit more time than you thought talking about why I think we should be transparent and then Also take some time on how you can do this in real life So my name is Willa van Dyke Currently, I'm a Dean's postdoctoral scholar at Florida State University in the Department of Psychology And I'm affiliated with the Florida Center for Reading Research And this is in Tallahassee, Florida where it's currently sunny but not warm In My work I focus on preventing reading failure in young children by understanding Individual differences in reading development effective interventions and effective teachers. I use a lot of advanced quantitative methods in my work and I try to adhere to open science practices and I do my best to Use these practices in educational science and promote them as well Currently funded through LD base, which is an NIH grant here at FCRR With doctors heart and shot Snyder and we are building a data repository. That's specifically for achievement and behavioral data For students with learning differences or about learning differences. And so that's really my big open science Assignment in these last two years that I've I've been here at Florida State I have slides and materials available on the OSF project page for this talk. There's a link here And you can find me on Twitter at Willa Van Dyke I tweet stuff about My life as a researcher sometimes that is fun and sometimes that's not so great When people ask why we should share data I Often will start with the Hubble Space Telescope It brought us these awesome pictures from The universe Here we have Starcluster and CG 602 here on the left top We have H H 24 also called the lightsaber. We have the Orion nebula the somberal galaxy There's just all these amazing images that came from NASA and the space telescope space telescope science Institute And I think a lot of you have seen These pictures or other famous pictures that came from the Hubble telescope In its last 31st that that's 31 years of existence. They're almost celebrating their 31st Anniversary I was gonna say birthday, but that doesn't seem right The Hubble however the as a project is much more than these pretty pictures So we're in or there in their 31st year in these 31 years This telescope has made over one hundred one point four million observations With these observations 17,000 peer reviewed studies have been published that generate about a hundred and fifty citations daily With a total of almost a million citations The it has these famous images that are across every astronomy textbook in the world It generates 150 gigabits of raw science every week With the total archive of a hundred and fifty terabytes of data that is available to everyone openly available to the public So if you would go to the Hubble website, you can go and download data from the Hubble telescope and the only restriction is if there's an ongoing project That so researchers can kind of book time In this in the telescope it will gather data for them and they have a year to use that data to Do their analysis and write up their papers and then after a year all the data is freely available to everyone else so it's a clear To me that open sharing of data has generated this insane amount of peer reviewed articles and publications If you were a young astronomer without any money to do any observations you can go to the Hubble data Hub and get data and write your papers, which I think is amazing um If you like to you know fun up your twitter feed a little bit sometimes because it can get what drab lately Follow these fun Hubble Accounts to get like astronomy in your feed it's kind of fun Um But besides um The Hubble which is you know, obviously one of the best Um Examples we have some good examples in education or developmental psychology uh fields so a very Good example is the talk bank system of which the child's is um a sub component And I don't know if you're familiar with child's but it's a language data bank um It has currently 230 corpora Including 30 different languages um There's about 3100 peer reviewed papers that have been written with data from a talk bank and child's It has over 4 000 users And again this data is openly available to the public At the at that website um And when I think about this The Hubble and also this child's data set There's I mean there is a way that you know 230 investigative groups could have written 3100 peer reviewed articles, but it seems to me very unlikely That in you know 10 years each of these groups did over 15 15 Um articles So there's just a lot more to be done with data than one researcher can do by him or herself um a lot of Has already been said about the benefits for specific fields of science in general in regards to transparency and sharing So for example, it's the idea that new new ideas will be generated faster And those ideas will then In advance the specific field we're talking about faster It will increase transparency in the research process maybe people have limited trust in other people's science and the more you share the the more transparent it is the more people will believe in your trustworthiness Um Increasing collaboration that comes through sharing your data or your science with other people But what I really wanted to focus on and talk to you about today for a while is The fact that I think transparency helps promote equity in research And what I mean with this more specifically is that By by sharing your data your materials your code you can support graduate students who don't have resources You can support early career faculty faculty from from under representative groups, um in I'm going to say education, but if you're in a related field in any field Or faculty and groups that under research Under resourced institutions And you can help them to get learning opportunities and research opportunities Um, and that will be kind of the first part of today's talk is to share experiences that I have had throughout My career, which is not very long yet, but hopefully will become, um Continue for another couple of decades Um after that I want to talk about tips and tricks for sharing materials Code and data if you have a new project And then I was going to do this, but actually it's only one slide And that's what what you do when you share elements of a project that um has already finished or is already started So I want to start with My experience with transparency and how this has paved the way for my journey to the stars I hope I hope it will be to the stars um So my first encounter with transparency was during my master's degree and I actually saw that my um Mentor for that project is on our talk today. Hi, Pam um, I was planning my thesis study with her Um in special education and I read this article with an intervention that I thought was just amazing Like I may have just been a little bit in love with this study um I really wanted to replicate the study. I was super excited about it. Um, I read the article, you know, many times I made notes Um, it had a fair amount of details But I was unsure about some of the key elements and how to address these elements And so I decided to email this author and ask for my details And I think you know, I was a master's student at this time I think that's a bold step to email somebody that you you read their work You're super excited about it. You're just kind of like in awe about this person. So I emailed them. That was scary Um, I was really pumped with all the adrenaline I waited a couple of weeks. I didn't hear anything. I was like, oh I don't know turns out she wasn't actually at this institution anymore But you know, I just used the email that was on the article and I didn't check it So I emailed her again at her current email address and finally we got a response back Um And that response, um Said no I will not give you more details because my intervention has been copyrighted and you can't do this And so I was a little stunned and disappointed and sad um, I deliberated with my mentor um we emailed once more to You know, kind of talk about replication. Um, that didn't work out so well either And so we decided just to drop it and do it anyway with the information that was in the article Um, so I kind of filled in the blanks from the article with what I thought Should be happening at that time. Um, I completed my study You know, I graduated it was all great But I was a bit disillusioned about the field and how we talk about doing interventions And if you're not allowed to replicate anything, how would we know if an in intervention is really going to work? Um, it was my impression. We are all here to help children So if somebody wants to replicate your work and help other children, I thought that would be great, but apparently Not everybody, um Things the same way Anyway, I graduated I moved on to a phd program and I soon forgot all about this incident because I was busy with You know doc student things So I'm gonna fast forward to my third year in my phd program I was taking an advanced class in, um, structural equation modeling And I needed to do an applied project as a final Using one of the methods that were part of the course in this semester. I really like statistics. Yay statistics. I love statistics But if you know something About complex structural equation models, you know, you need a large data set for that. Well, I'm in special education We don't have that much large data because sometimes it's only like 10 kids with my particular problem so, um Lucky for me one of my faculty advisors at In my doc program did a lot of secondary data analysis And he helped me get access to one of the data sets from the national center for teacher effectiveness main study That's um publicly available through icpsr You do have to apply for it to get it He was affiliated and so I I was able to work on that data set So that was my my one problem solved. I had some open data to work with Um, and I thought well, that's great. I'm set now but then I remembered I actually had to model something pretty complex and um I didn't really know how to model this so what I did Was try to find some example papers that use the same type of model And see if they anyone shared their code in mplus so that I could kind of adapt it to what I needed um And I didn't find anything so sad willow Spend a lot of time kind of googling this model and trying to find Figure something out eventually I figured to model out and it was It was good, but I definitely had some headaches Over this modeling and not being able to find any of the code So that was another kind of sad story when people don't share stuff Um, but it can be different So last year during my post doc Um, I really experienced the benefit of code sharing So we're working on integrative data analysis problems That have to do with the data repository and how we want to connect different data sets and make it into a big data set And um Dan bauer who is one of the people behind idea Actually does post code on his website Um, and it's more than just code as you can see It's kind of scribble, but he has annotations in his code So this is the code that goes with one of their original papers and I have the citation here below And he actually goes through and tells you what each of these elements are And so I was like level up. This is amazing. I can just Copy this Adapt it to the variables I have and run this pretty complex model And I'm kind of learning about how to Reparameterize things in this model and it was amazing. I learned so much and All you had to do is go to his website. So that was Really for me a moment where I was like, oh my gosh, this would have taken me like a year to figure out by myself and this Um professor just has it on their website and you can learn from it And it was an amazing feeling to be able to do this Pretty complex model. Um by kind of following along with his code um So that's sharing code with people um I'm gonna jump back a little bit because I want to also talk about this sharing of data Um, I'm gonna back up to my last year in my doc program Because I had this brilliant idea for my dissertation. You all know this feeling It was going to revolutionize my field. This is going to be like I was gonna win so many awards for this dissertation It was going to be awesome. I had it all figured out. I had it written up. I knew exactly what I needed to do The problem was this project needed a lot of data from a lot of kids Um, I am a fourth year doc student Don't have any grant funding or anything. And so my advisor was like No, we're not doing this because you don't have time and you don't have money to pay other people And so I'm sorry, but your brilliant idea has to go on the shelf until you get a real job And you have people doing stuff for you so another kind of Sad day for me um And I had some stressful weeks after she said that I am grateful for her because it would have never worked out And it probably still would have been working on this project right now two years later um But it was some stressful weeks afterwards to decide what am I going to do then like if I want to do secondary data analysis if I can't collect these these data myself. Where am I going to get these data? um, and so I got very lucky Oh I got very lucky um because um, my coin who is in this picture A professor at University of Connecticut Had a big data set He's a friend of one of my committee members So he had a boatload of data and they were okay with me using these data And so I was super excited about that. I'm very grateful The only catch they had was that I could use the data, but then I also had to clean this data set If you've ever cleaned the data, I see some people like snickering already if you've ever cleaned somebody else's data set You know that took me a while a couple of months, but I got it done um, and I was able to use these data, which was amazing um I did my dissertation. I graduated. I didn't win any awards for this dissertation But it was fine because it got me this job that I have right now my postdoc about open science and I really got into this idea of Sharing your data openly and and doing things with it so other people can learn um I must say if you're not on twitter you should be because that's where I found my job as a postdoc Um, and it's amazing um So this was my totally anecdotal n of one account of why you should share stuff to help other people and improve equity I know it's really not representational and probably not generalizable to the other public But I really wouldn't be where I am today And obviously invested in open science if I hadn't experienced What a lack of transparency means And being able to compare that what having transparency means um For my science um and in general um, I You know really valued this idea of not not having code that is transparent and trying to really figure out how that works by yourself And contrasting that with code that's readily available that you can learn from even if you're not in this person's class um and Or or spending Loads of time at your statistics professor's office hours every week. I'm pretty sure dr. Leitchie was like Are you here again? I was like, yeah, I really need to know how this works now um And I really noticed what it can do for you if you have a research question and somebody else has data that you can use for this question So you don't have to spend resources that either you don't have or why would we even expand or spend all these resources And time of kids that I needed to assess and time of the assessors to get data that somebody else already has It seems just like a waste of time um And for people like me who didn't have those resources, um, I think that's very important because it was Enabling me to do what I wanted and not having me set back By not having that opportunity So I think that transparency really can help early career research researchers And faculty at under representative institutions or under resource institutions or from minority backgrounds Because we do know that you know Resources with researchers with disabilities And from minority backgrounds have lower chances of getting grants for example to do their own Experiments and sometimes That can also be because you didn't have the opportunity to collect pilot data And you also need resources for that And so I think if we Share our stuff with other people we give them the opportunity to become better researchers Sorry, I keep saying resources instead of researchers. That's annoying. Um And I think that will help The equity in our field where not a bunch of people get all the money and all the things and here's the rest of us Is kind of sitting around because we didn't have those opportunities So I think it will You know allow people to answer their research questions without having to expand these resources And it will also allow them to replicate studies very precisely Which can be really beneficial for your science as well So that was kind of a lot about What I have experienced Um, and there were some tips in there I think already, but I wanted to um Move on unless you have questions I didn't say that but you're welcome to just kind of blurt out your question Um, if you have any Um, want to do some tips and tricks about sharing and what you can do when you're starting a new project Um And how that might work in your situation So we'll start a little bit with materials and I'm starting with that because it's probably the easiest thing this year And why is that it's because you have most of the materials you use in a project on your computer digitally anyway So really all you need to do is Drag that one folder into a data repository and you're done It's not quite that easy, but it's almost that easy um So what what things should you share in a project? Well Basically, you need to share anything that someone else might need to replicate your study So that can be your study protocols Um any assessments that you used If you do an intervention the stimuli you used for this intervention Um a walkthrough of an intervention about what people are exactly doing Um specific instructions for smaller parts of a project Um blank informed consent forms can be very helpful and If you had physical materials you can use like pictures or descriptions. So I have an example here of um, this is a A letter board where kids can like put different letters together to make new words. Um So you can put pictures in there so people can kind of see what that looks like Um, you can share video or audio footage if you have permission to do so Of an intervention to kind of show people what how what an intervention looks like it can Look like something on paper, but it might look different Um in practice so you could share that And then anything else you can think of that people might need to replicate your work um and by sharing Everything that something someone might need they could either choose to replicate your work Exactly the way you have it or maybe tweak it a little bit to um To work better with the situation they are in So like I said sharing your materials is super easy because basically you just move that whole folder over from your computer to the cloud um You could either post it to a data repository Um such as the osf or fixed share or ld base um that has the um benefit of assigning a doi to your materials So that it's a citable product of your project And then you can copyright it And then other people can use it even though it's copyrighted and not say no you can't use it Um, you can also choose to put it as supplemental materials To your journal article Um I prefer putting it in a data repository because I have copyright when you put it as supplemental material in your journal article you Put the copyright with the journal so um That can be a disadvantage I think and also if the journal is paywall then people won't have access to your materials anymore So that's just you know, it's It's your choice, but personally I would use data repositories to to stay within the open science Movement So what are some things to think about when you start a project and you want to share your materials? It's a good idea to set up a specific lab notebook Where you document your whole workflow And then any deviations from these protocols that you had um This is important to note as you go along because Sometimes it's small differences that you won't actually remember later on in the project But that became really important for people to replicate your work um The second thing is to select what you want to share and have that designated folder on your project or your computer within your big project where you can Keep these materials and have a clean and formatable copy so With that I mean If you share your materials as a txt file for example or an html file you will More people will be able to access and tweak it if you want It to be able to be tweaked PDFs might work because most people will have a pdf reader and you can get a free pdf reader But it's not adaptable as easy. So if you really want to do more open science stuff and use So non proprietary formats such as txt and html um Then I do encourage you to get copyright on your materials Um Not so that you can say that you can't share it, but so that you can share it Um, but people will still cite your work Um, there are a lot of licenses out there that you can use. I don't have Very specific knowledge of these licenses, but there are resources on osf You know when you share something they will ask you what license do you want? um, so if you want people to not be able to change it There's a different license and if you want people just to cite but they can change your materials There are different licenses for that um But it just gives you you know the option to share your stuff with other people and give them permission to use it While you still keep the original Rights to your materials um If you have Used work of other people Um and in education i'm thinking of standardized assessments or other people's assessments for example or other people's surveys You should check the copyright of those materials to see if you're allowed to share them And you don't actually have to share anything that's publicly available Because if you for example use the wickock johnson, um tests of achievement I should be able to get them because they're a free product Like I have to make I maybe have to pay a thousand dollars for it, which is sad, but um You don't have the right to share that On your material page, but I should be able to get that so if you said oh, I used the woodcock johnson I'm like, okay. I can get access to that so If it's something that somebody else did check the copyright to make sure that you have rights to share this with other people um If you used a survey that somebody else made and then you made adaptations You should make note of those adaptations if you're not allowed to share it so that people at least know what you did differently Um, and then add your materials to your cv. It has a doi It is something that you made um, and it should be on there so that people can see what you're doing um moving on to code And with code what I really mean Is an annotated workflow that details all the steps that you took in a statistical analysis That began with the raw data and that ends with the final statistical results Now some of you may do qualitative research, which I don't know a whole lot about But you can also Share your code and that might be I do quotes because I don't mean statistical Code and not like I don't value your research um With coding protocol, for example, how did you come up with themes like that whole process of coding your Your data that could be considered code and qualitative research And if you are a single case researcher you can include like face change uh, design Decisions, um, if you have coding for specific behaviors like observational codes all those All those things Um Just as with materials you can share your code by putting it on a data repository That will give you a doi for your code, which is great So other people can cite if they want to use your code Um, and you can also again do it as supplemental materials in a journal with the same advantages and disadvantages that I talked about with materials What should you think about when you share code when you start a project? The first thing is your choice of analysis environment Um, sometimes your choice will depend on what type of analysis you're doing And sometimes it will depend on the availability of different programs Um, but I think mostly it's um a personal preference kind of thing So I prefer to use r because it's an open environment But it doesn't always function the way I wanted And sometimes I have to make analysis decisions That I may Usually not do but just because r doesn't have the functionality that I need um, for example in LME 4 if you do multi-level models It doesn't do full information likelihood if that makes sense to you and you have to do multiple imputation Which if you used m plus then you could just do the following So, you know, it kind of depends on what you want, which one you choose What are you most comfortable with and what can do the functionality that you need? um But whichever environment you choose Annotating your workflow in details Is the most important thing and annotating your workflow is really really hard I'll share some examples of code that I did for two projects That's about 10 months apart and you can see the difference From when I started to where I finished Um, and you want to annotate the analysis decisions So why did you drop a certain variable or why did you decide to Change the model a little bit based on you know outcomes of previous analysis But also on the analysis itself or the code like what does this code doing? What step of the analysis are you in here? um If the stats program that you're using or or if you're using a different Program that doesn't allow you to annotate your code itself You can get screen shot screenshots from the different steps that you've taken I have an example of some excel screenshots That people have done for graphing for example um, and then You want to include details about the software the version and the packages that you used So that when people want to rerun your analysis they can do it with those packages and with that Set And I say this because you know, I use r it updates all the time And sometimes when it updates your code doesn't run because now your code is old and stuff changed But if you said okay, I ran this on this date with these specific packages then people can You know evaluate if that worked um, but to kind of combat that I like to share my input and my output um So that people can see what came out of your analyses and they they don't have to rerun it themselves But but they can check what you did to see if it's If it's right Uh, or if they agree with what your decisions um were So I promised some example code This is code. I wrote maybe like a year and a half ago. That was my first try um I do want to say that I wasn't quite sure if I wanted to share this code Yes or no because sharing code to me is pretty scary as what if I made a mistake and then people will say You know, that was why did you do that? Uh, not that everyone anyone never nobody will read my code, but here you are The code is in white and my annotations are in yellow and you can see there's just one annotation It's to get estimates for this model and I can I read through it and I kind of understand what I did, but I could have done better with my annotations So this is my latest one. You can see the increase of my annotations and I have annotations where I say this is the step of the analysis that I'm in And this is what I'm supposed to do I have notes about some parts of the code what that did um, and I have um Decisions that I made based on the outcomes And to me this the second example is much better because I can go back And now I understand what I did which sometimes I don't really understand What I did a year ago because I had to google it probably 500 times and I don't remember um So sharing code and getting code ready to share is really not easy, but you do get better at it as you practice it more often um, and here is an example of um What I said about sharing like excel files. This is from an article by um brian or erin barton and brian reiko in 2012 Where they teach people how to use excel to make graphs and they had these specific screenshots And you could do this in your own project like screenshotting things. Um I think it would be pretty tedious if you have to screenshot every option. Um, but you could also um Kind of Write out that workflow if you need it to if your program doesn't allow for annotations So what you shoot you there's some tips and tips and tricks for sharing code um, I would separate your analysis into cleaning code and modeling code or um, at least Select a point where your data is de identified um, you don't want to share the identity or identifiable data And so if you have cleaning code up to de identification, then you can um Save that data set separately and start your code from there If you still have to do a lot of cleaning, I would put I I prefer personally to do that separately and then have a separate code for the analysis itself Um, you want to confirm if your analysis can be rerun with the data that is provided So I work in r and then I clear my whole environment and say run and then hopefully everything still works I think this is a very important step Before you share your stuff um And I do this after a while like I don't look at my code or the analysis for a while I come back to it And I rerun it and I'm kind of read through to see if I still understand what I did And if I don't understand what I did that means it needs more annotations Um and checking to rerun if everything is still functioning the way um It's supposed to I I suggest over annotating But it's it's it is slightly tedious, but you get better at it Um and think about the people that might use your code. So think about you know past willa of five years who didn't understand how to do Uh multilevel structural equation model And then you read someone's code and it says oh you just do this or all you need to do is and that can be very Frustrating and intimidating because I don't know what's going on. It's not just to me Um, and so kind of thinking about who will read and use your code and adjust your language accordingly You don't have to baby them, but um coding is scary to a lot of people. And so the more you can do The more you can do to make it less scary I think is very important um So if those are words you're going to use you probably need more annotations for other people I would share my code in Um compatible format. So if you use m plus or sass That's great. And you can share that But if you save it as a txt file It means that everyone can open it and read it even if they don't have m plus or sass on their computer, which is very helpful Um If you work in r I prefer markdown documents that are shared as html files That people can pull up on their um internet browser and it has the advantage that it will put your input and your output together So this is uh just a little example of I had um birthdays in some of my data and so This code shows where I changed the birth date and the test date to a jet testing um, and this is code That I wouldn't really share with people and in my markdown file I have it as Don't post it. So when it makes it into an html file, you actually don't see this part in the code um, and then the data that I would share would be Saved after I did these steps. So that will be totally Out of the data, but those are things that you should think about when you share your code in your data So maybe the most important part is your data um And I have about 15 minutes, right? I think we can just make it um What should you share? um Well, it's your raw But curated data set that includes data at the Individual level and preferably at the item level. So even if you use A standardized assessment. I would share the answers for each separate item instead of the total score Because some people might be interested in You know at item level Data, um, you can do two things you can um share a subset of your data that is um Connected to a specific analysis or paper that you did Um, you can also just share all your data Which again is preferable So all the data that you collected In a project and I want to note that data is not the same as records So data is your digital data and the records are the paper versions of somebody's assessments, for example So those two are not equal But preferable is to share all your data at the item level for your whole project And then besides your data, you also want to share your metadata Which includes a code book Which I will talk a little bit more about in a second and other Details about your study. So what are what were the aims of your project? What's the information about the sample in your study? Um, what were data collection procedures like and what about missing data? What does it look like? And this is to help people who want to use your data understand the context of this data These data I have a question. Yes, because this is a lot of work to describe this again And it's already described most likely in the publication. Why do it again? um So if your publication is behind the paywall, but you have your data open Then the people that want to use your data may not have access to it If you have this on a project page um So on osf for example or on ldbase when we go live you can do these descriptions for your project and then attach your data set with it um Since you've already had it it's really not that much work to copy paste it into you know a separate Something for this either um, but it's just you know If I have your data and you say well, this was a treatment group and this was the and this was the control group then You know, I do want to know what that treatment looked like So having a little bit extra information there, but yeah, it is a little bit of extra work Is that actually uh legally possible? Especially when my paper is behind uh paywall that I just copy and paste it Well, you may want to rephrase some things. Um Yeah Which is why you shouldn't publish behind a paywall and pay the five thousand dollars it costs to publish open access um, or do or do a pre-print But yeah, probably legally don't copy paste, but reword it in some way. Yeah, you're right. Um, what should you do? You should write a consent form for your project that includes statement about sharing your data. This is very important You should make a data management plan and probably include a data manager in your project if you have funds for that um You need to clean and de-identify your data set You need to create your metadata identify the repository you want to um publish your data at Decide on access restrictions And then you want to upload your data in a format that's universal So that everyone can use it. So again, if you upload your data as a sbss dataset then, you know, not everyone wants to pay for sbss and then Um, it's possible in r to convert that but it's just easier for everyone if it's in a csv file, for example So I'll I'll go through Some of these in a little bit more detail. Um, the first one is consent forms And this is really a super important step Because this will either not allow you to share your data or allow you to share your data um And I have some very specific language on the next slide that you can find in share and heart 2020 And on the osf page, I have all the citations To for you in a separate file Um, but what you want to do in your consent form is have a statement um That indicates that data may be shared with other people You want to avoid restrictive language? So Restrictive language is a lot. I have used a lot in my consent forms before Saying that these data will only be kept for seven years and accessed by the project PI and affiliated people Which means that you can't share it because other people are not affiliated people to your project and you're basically done Um, you don't want to do vague language like these data will be kept secure under the extent of the law Because the laws in the u.s. Are very different than the laws in europe um And even within the u.s. The laws in one state can be very different from the laws in the other state and laws might change So what does that mean? So you really? So this is language that I have used in my own consent forms. I'm now i'm like, oh Well, that was a mistake. Don't don't do that again. Um And you want to be consistent in this language across people in your project So if you do have parent consent forms and student consent forms and teacher consent forms, you want to have the same language For all these people So for example, instead of saying all information will be kept for at least seven years in a secure location And only project staff will have access to it. Which is very restrictive You can say the original paper records and identifiable electronic data will be kept for at least seven years But data with all identifiers removed may be used for future projects that focus on any topic and may be unrelated to this study this new data may be May be made available to the general public via the internet and an open database So that's example language that works for most irbs at least here in the u.s. Um But if you're in a different country, you might want to check That but there is language in the shero and heart White paper on working with your irb and writing consent forms um Maybe the most daunting and time consuming task is necessary for data sharing is managing your data um I was going to detail a lot of this but really since I'm running out of time because I'm lengthy as usual um You may want to check out the data management hackathon that happens tomorrow at 10 o'clock with my colleagues terror reynolds And chris shots nighter and they will tell you all about you know How to manage your data in a way that is helpful? They also have a white paper out that I've referenced on the citation list But really you want to Spend a lot of time before you actually collect your data Into thinking about how you're going to manage these data and it will really help you avoid Getting lost or the data that gets lost like the actual paper versions of it with data entry mistakes And you won't have to clean as much later if your entry management system is really good They have really great tips and tricks That I could spend time on now But I will just continue with them and I will or continue with my slides and refer You to their Paper and their hackathon that happens tomorrow um I'm going to do this. I'm going to do the same with cleaning and de-identifying um also tomorrow at 11 um at my my graduate colleagues Ashley Edwards and Jeffrey Sherrow are doing a workshop on data de-identification um Very shortly You want to remove everything like names um social security numbers if you even have those birthdays addresses and all those things but there are some other things that people can do to Re-identify people By using different variables in your data set and they can talk to you about um How you avoid that and what you do? And I think tomorrow they're going to demonstrate that and help People out if they have problems with de-identification um, then the metadata So for your codebook you want to do The names of all the variables with their labels what the values are What were recoding strategies if you like Reverse coded an item maybe for your analysis um the specifics of a data set what was it if you have a longitudinal Study what wave was this? Was this teachers only or is it teachers and students? um There's some really good resources out there like the data documentation initiative that can help you with creating a code book But my best tip here is to work with your librarians There are amazing librarians out there who have so much knowledge about Open science and data documentation and data Storage that can really help you out In in getting you set up in the right direction so the metadata we just Kind of talked about so it's really just details about your data And your study that other people might want Then you should Check a repository where you want to put your data. Um, and I know in europe they have pretty heavy restrictions Or regulations about where you can share your data the u.s. Is not that picky apparently But they're different Different, um, I am from europe originally so I do know about all the regulations um There's different like interdiscipline interdisciplinary um Repositories there's discipline specific Repositories and grant agencies sometimes have their own repositories as well. So just check out What kind of works with what you want? What would be the audience? That works really well for you I'm going to pitch ld base because I work for them and we're amazing Um, and we will be live on march 15 um But there's different, you know, if you have qualitative data or you have video data, there's More specific repositories that would work for your data. So spend some time in identifying where you want to put your data um So I'm doing great on time. Like I said, I was going to have this whole section on sharing elements from a finished project But I could only come up with one slide And that is it takes a lot of time to share stuff if you're already finished Because all the steps that I just talked about and tips of things to do before you start you now have to do afterwards, which is time Um, and we don't always want to spend that time And so that's something to think about um You know going back through your code and annotating your code so other people might understand it Or having your data in a shape that you want not doing your metadata and you're like well I'll have to copy and paste from this other document into here and it's you know, it's some work Um, the only thing you really need to pay attention to is your consent forms So I talked a little bit about specific language that you need um Your old consent forms may not have that language and that means you can't just share stuff Especially the data and but you can work with your IRB um Sometimes they can give you a waiver um When you say well, I want I didn't have this in my original consent form But I want to share this non identifiable data to have other people look at it And then sometimes they give you a waiver that says okay, that's okay But if you don't get that waiver and you still share your data You will be in trouble And everything else is just the same. It just takes more time Afterwards to get your stuff fixed up in the way that you Would want it to be in And that's it. That's all I have and it is exactly 12 o'clock. So yay. Go me Um, so I have these slides up Um, I have a thing with citations and I have the examples of my code and um, and if you have things that you're like Would like me to find the materials for um, let me know Um, and I can try and put that on that. Um always a project page Uh, does anyone have any questions Philip and I just started to publish results and I found it really helpful I split it my art document One from the from the war data to the use data and having this in a special folder So people who are actually not interested in the war data don't have to go through all of these steps But I haven't seen that so far and I've It really felt natural to me, but I thought maybe there are reasons why they're doing this so to share um Just like summary statistics. Is that what you mean? No, um, I have uh, I split it my audio document So I have one document that just uh, transform the war data to the use data And then another document that just loads the data set from the That's a result from the first one and then doing the analysis Oh, what I said like difference between cleaning and analysis itself. Um To me that that seems natural to me because that's how I do it I'd like to have different markdown things for each one, but I could see that might be confusing for other people um I don't know if there's conventions for that actually Yeah, to me that would make total sense But yeah, I don't know what to tell you I think you did the right thing I would have one question maybe a bit uh specific, but I would be curious whether you have any recommendations For like generating machine readable metadata such in R So such as code book package Data made or data spice. I mean there are several out there. What what would you recommend or are you using any of them? I'm not using any of those. Um I've looked at the code book package um It didn't really do what I wanted it to do I think Because it's more like this is the variable. This is the label and here's like a range of values. It's kind of what I got out of it um There are Oh, I don't know on the top of my head, but there are some really good code books Out there. Um, actually the study that I was mentioning earlier the national center for teacher effectiveness and math Had a really good code book um Yeah I haven't really dabbled around with it a lot um Would that be like uh findable like machine readable? Uh, because that's what I liked about the code book. Yeah, it's actually just a PDF. So not really Um I think the ice who maybe that data alliance Has some more information on that because I have been reading about it. I just haven't Tried it yet because I use secondary data. So I just get people's code book and use those Okay, yeah, it was a very specific question. Sorry. Yeah, no, it's a it's a great question. Um If you go to the uh data De-identification one you should ask Ashley Edwards because she's done a lot more looking into that too Just don't tell her I tell and I told you too I have just a really small question As well. First of all, thanks so much for this talk. It was it was really great Something that you said that I never really considered before was putting Your shareable materials on your cv and I'm not sure that I've really seen that but I think that's lovely and brilliant and I'm just wondering if you have Any more Feedback or advice about like where that would go whether it would go right underneath the publication or if you have any Examples that would be helpful in um seeing what an example of that would look like um The way I think about and I haven't I haven't actually shared any materials To put on my cv sadly, um What I've seen people on twitter do is have the different sections. So have an um, you know open materials Below your publications have like open materials Open code open data and have kind of that same structure um and maybe have The badges I've also seen people put badges behind their publication like this is open access This is open code and then you can link back to that next section I personally that's I think how I would approach it Or like this is material that I have um shared Great, thanks Are there any more questions? I know it's it's lunchtime here. It's dinnertime in In Germany. I saw a lot of people from Germany right well Thank you guys so much for coming um And keeping your video on and nodding from time to time. It's very encouraging when you're staring over zoom um, so Again, you can find me on twitter no At willow van dyke um, and if you want to email me it's w van dyke my last name at fsu.edu and I can put that in the chat. Oh, there's all kinds of Sorry, I didn't actually look at the chat because apparently I'm not great at zoom