 Welcome all to our second data polusa session on data management plans. We're just getting started. I think we may still have a few stragglers. But we wanted to get started and I'm Al. I'm the research data librarian here at the USU libraries. And today's session is on data management plans. And I also wanted to introduce my co-host and zoom host, zoom manager, Mike, would you like to say a word of introduction? Good afternoon, everyone. My name is Mike Shelton. I am the library assistant for our DMS and L and I work together to help the campus community with your data management needs. I will be acting as the host today. So if you have any questions or any problems, please send me a message in the chat and we'll try to address those. Yes, so thanks and welcome. We're really happy to have you here today focus today's session focuses on data management plans. Seems like we're a pretty small group and so there were two things that I wanted to start with. One was to as with yesterday asked you to introduce yourselves. There were a handful of you that were there yesterday and so hello again thanks for coming back. And, but if you could introduce yourself with your name your department and your research interests and today we said favorite kind of ice cream. You can share that again if you like. The other thing that I wanted to check about is that we're going to workshop a couple of sample data management plans in the latter half of this. And if anyone has their own data management plan that they would like to workshop. Let Mike know and we'll set it up as one of the little discussion sessions. Otherwise, we have a handful of sample ones that we were going to workshop through. So, anyway, I'm I'm the research data management librarian at the library. And my research interests have largely been around data practices and the sciences I've done a lot with astronomers and with medical records and health data. My favorite kind of ice cream yesterday I said all flavors today I'm going to say chocolate. So, Mike you want to go next and then maybe call on someone. Yeah, absolutely. Again, my name is Mike Shelton I'm the library assistant for our DMS. Beyond library science and data management issues my academic background is in history. I really enjoy that. And my favorite ice cream is cookies and cream. Sarah, would you like to introduce yourself next please. Yeah, yeah, bloody, bloody kick us off here. My name is Sarah Harper. I am a postdoctoral fellow in the Department of Physiology and Health Sciences. My research interest very quit a bit and the easiest way to sign up is kind of exercise and aging and the crossover that encompasses those aspects. And yesterday I completely forgot to even mention my favorite ice cream which is chocolate peanut butter. I wanted to jump in really quickly I would also love to hear more about your data and just what kind of data you have. Yeah, and I also need to give a disclaimer we are recording this we want to make it available to other folks who aren't able to be here live and so FYI and if you have any issues with that please let me know and we can address that after the fact or whatnot. So, thank you. To elaborate a little bit more on my data to that point. First of all, glad it's being recorded, you know, it's great to share these four resources for everyone involved. Let's see, I have a few data sets to primary ones come to mind at the moment. I have a data set that is associated with my American Heart Association postdoctoral fellowship. That involves a collection of muscle and blood samples and their respective outcomes for like more systematic inflammation markers reactive oxygen species. And last testing. Let's see. And some like mitochondria enzyme base outputs. In addition to some, like, like functional outcomes from those older adult human samples so everything from age highway to be how well they walk across the room and some measures from that perspective. My more recent data set is from a observational trial we just finished here on campus, where we were observing stair negotiation, simply observing how individuals walked up and down a two different staircases with one staircase that had a visual contrast applied. And then we removed and then did a removed intervention and and did it on a second staircase at the same time so looking at there see there's an effective staircase fact of the intervention of how people moved in the staircase walks be did it reduce any risk factors slip trips falls things of that nature. So hopefully that gives you a better overview of two more recent projects. Very interesting. Thank you. Who would like to go next. Okay, can I get. If I pronounce either of your names wrong please forgive me and correct me. Let's see. I can go next. Hi, my name is Patricia. I am a PhD student in civil and environmental engineering. My research currently is on wastewater treatment and anaerobic digestion. I have a lot of data from a wastewater treatment plan that I was analyzing, and they also have data that I collected from the lab all of last year in anaerobic digestion. Yeah, and my favorite ice cream is still lemon custard. Good choice. Thank you. I hope I pronounced this correctly. Would you like to go next please. Oh, thank you. They've introduced themselves in the chat there. Oh, great. PhD student. Dr. Taurus. Yes, you work with Dr. Taurus right. That's correct. Oh, you have a microphone. Okay, that's great. We understand. Thanks for typing in the chat. And Blake just joined us. We are doing introductions really briefly. Would you like to introduce yourself as well? Sure. My name is Blake Tullis. I'm the associate vice president for research and have the pleasure among other things working with the data management group. I come from the Department of Civil and Environmental Engineering and worked at the water lab prior to coming to Old Main. I'm happy to be here and appreciate the training. Well, thank you. And yesterday we covered a little bit and so I may skim through this part to not bore you too much but why worry about data management what's the, you know why is it such a big deal. And what do you think what, why do you worry about data management. What do you think the reasons why is it it's important to invest time and data management. Is that to myself and the other participants just that question. Okay. That was a general question. Sorry. It's critical it's a foundation it's you are trying to make inferences or statements about the data and in your findings, you need to be able to clearly and efficiently back that up. They need to be reliable we need to be able to replicate it. So easy and functional access is going to be critical. And this is a little cartoon that also talks about why like data management is important and this is a joke about naming conventions. And you always think you're going to remember it at the time and and then you come back to it later and it's not at all obvious. So, this is something that we touched on yesterday, briefly, briefly, why worry about data management, it makes your grant applications more competitive, it increases the likelihood that your papers will be cited by others. You'll forget details if you don't document your data, you can build more complex data sets and automate them more easily. You can increase the likelihood that you'll be able to recruit and talented students and advisors to your research project. It increases the likelihood that your research can be reproduced. And if it was worth collecting. It's worth sharing with others and documenting adequately. Your students will thank you later, the people who use your papers will thank you later. You get it gets easier with practice, and you can publish data papers describing your data. And FYI Elsevier just released a whole series of publications that are data journals and they call them their research elements journal. So they're premier research element journals called data in brief, but there are a lot of different journals now that are just focusing on data publications in addition to some of the major journals adding in data publications section. So, moving on. This was something we also talked touched on yesterday so I'll kind of go through it quickly. Again, most many people think that putting things on the internet makes it accessible most people don't realize that the half life of things on the internet or two and a half to five years which means that in two and a half to five years over half of the links are no longer functional. So, and when I said that there was a substantial citation advantage to making your data available this is one of the more recent studies that just came out in 2020 I believe. And so there were some earlier studies like 2013 2015 that showed a substantial citation advantage those were focusing in the bio med disciplines. There's a much larger study looking at PL, PL, PLOS articles and bio med central articles and so they had 20,000 ones and PLOS and 30,000 and bio med central, and they found a substantial citation advantage to the ones that actually made their data available in a repository. I think that a majority of them had some kind of statement about like, we're making the data available but only about 20% of them actually made them available in a repository. So, that's, you know, as a as an academic that's a major reason to make your data available and findable and site them all and all of that. So moving on to data management plans. Why are they necessary we touched on it a little bit yesterday and talking about that OSTP memo from 2013 that was about requiring the products of publicly funded research to be made available to the public within a reasonable time. And one of the things they implemented with that were these data management plans and so many federal funders require data management plans for grant applications. They vary from agency to agency and from division to division within those agencies. And, but the point of doing a data management plan from from the perspective of the funders is also often to encourage researchers to think through their data sharing data collection and data archiving plans before beginning to collect the data. So, and I'm getting to that. The other thing that's also important to realize when you're doing a data management plan is that when you're doing a public when you're applying for federal funds you're applying for public monies. So, when you're doing a data management plan it often becomes a public document as well. And so when you write data management plan it's important to remember that other people beyond the grant agency may also be reading it and it will reflect on you as an academic and on your research competencies. So, it's important to think through the data management plan in terms of the audience both in terms of the grant funders that you're sending it to but also in how it reflects on you in terms of your research methods. So, 10 questions to ask yourself before writing a data management plan. And I think it's really important to ask yourself these questions, because you have to think through the process of what is your data what are you collecting. How are you going to store it where are you going to put it. How is it going to be organized what kind of file structures are you going to use before you can really write a very effective data management plan. And we have some really great tools that we're going to point you at, but you kind of have to have a game plan before you start to use those tools. So, the 10 questions that I recommend you ask yourself before starting to write a data management plan are what kind of data set is it, what kind of data are you collecting who will be responsible for the data. Is it just you do you have collaborators, who is going to use the data who's the audience for it. Who are you going to be sharing it with. How should you describe it are there metadata standards you're using. Are there, are you going to need a data dictionary, what kind of headers are you going to use if you're creating spreadsheets. Are there restrictions to using your data. Are there confidentiality. I think Sarah with your data I imagine there were, there are probably like HIPAA and human subjects issues with it is that, or is it fully anonymized. Sorry to put you on the spot. Yeah, it looks like Sarah is actually stepped out of the. But that's all right. I can't actually see everybody so I'm flying a little blind here. But anyway, with that type of data that she was describing earlier they're often HIPAA which are health record restrictions and human subjects restrictions and IRB requirements. Depending on what kind of data you have, there may be restrictions to making it available or it may need to be anonymized. They're also, as we discussed a little bit yesterday some export control issues. Then there are questions of where will I archive my data, should it be put in a disciplinary repository are there other repositories that are there repositories that are used in your field or would you like it to go into a more cryptographic like widespread repository that are used by many different disciplines. Lots of questions like that some repositories are designed for code. One repository that's was developed at us you and is really highly recorded is hydro share and that's for hydrologic and hydrologic related data. Similarly, what tools were required to produce my data or to use it and these are all things that need to be documented before anyone else can use it. And then another important thing when you're doing a grant application is how much will the data management process cost, how much will it cost to describe it to package it and to make it available to others and put it into an archive. And the thing to bear in mind about that is it can often be really difficult to make those estimates, but you can include them in the grant application as something for the grant to pay for. Another important question to ask is am I planning to author publications based on the data and that's important often because publications may require you to make the data available in conjunction with the publication and so if that's the case. It's nice to be aware of that so that you have it ready to go when they ask for it and you're not scrambling after the fact when they have accepted the manuscript but they're needing the data ASAP. So, and the most important question is where can I get help answering these questions because it's often a process to work through all of these details and those are the things that I recommend thinking through before getting to a data management plan. The obvious answer for the last one is that we're here to help with all of these questions and have resources available on our website that include like the different funders and so we're going to step through some of the typical components of a data management plan after, you know, looking at the questions that I recommend you ask yourself and kind of think through bearing in mind that we're here to help with data management plans or website has a lot of information about them. And then at the end we also have some other tools that we recommend or that can aid with putting together a data management plan. I wanted to pause here really quickly though and see if there were any questions on what I covered so far. Are we good. So, while there is some variability in what goes into a data management plan depending on the funder and the agency, and even the different division. There are a lot of commonalities like typically data, most data management plans include a section for expected data, which is what kind of data is. How do you plan to collect it, how do you plan to collect it, the roles and responsibilities in terms of who is going to be doing what with the data, what is their responsibility are you going to have a data manager, who's going to be responsible for archiving it who's going to be responsible for describing it and all of those questions. So the data metadata and standards data access and sharing the policies for use and the plans for archiving and preservation. And some also include a reporting and monitoring section. So and we're going to go through these different sections. So the expected data, your data management plan should describe the type of data that your plan and collecting, how it will be captured in the format. For example, survey data will be collected via Qualtrics survey software it will be downloaded and saved in CSV format, describe the types of data you will collect or produce the format the quantity of the size. Where will the data be stored and backed up describe how much data you expect to generate are you talking about megabytes of data are you talking about gigabytes are you talking about terabytes describe this for each type of data you expect to generate. And depending on the complexity of the project you may have multiple types of data, you may have data you may have code, you may have software. If you have special software that are required such as like doing modeling include this as well. And the other thing that's also important is that it's often best practice to save your data in a standard format and preferably an open format so that it can be opened in the future as well. There have been a lot of problems with data archive in terms of proprietary formats that have been difficult to port forward, less so in the sciences but some of the like AutoCAD formats are just nightmarish. But anyway, we have resources on our website. We have the talk about storing and archiving the talk about file formats that talk about describing your data. And that's what the links in these slides are, and they're also up on the canvas site. So expected data. There we go, roles and responsibilities. Most funders acts for a section on roles and responsibilities in the section you want to clearly define who will who in the research process will be responsible for different aspects of data management, who will be responsible for managing and backing up the data who's responsible for depositing who's responsible for ensuring that the terms of the data management plan are followed. For example, what happens if the PI leaves the institution who's responsible then. Examples of roles include the data collector the metadata generator, ie the person who writes the metadata, which is a fancy term for saying the person who does the descriptions of the data, the project director. The person or people responsible for data backup during the data collection the people responsible for submitting the data to an archive or data rips repository. And in a smaller project that might be all the same person for something like a dissertation that's all one person. So there's something like a multi sided grant there may be multiple people for each role. So for each task it's best to indicate who's going to perform it make sure they have the appropriate skills and if they don't. You can provide training and that can also be budgeted within the grant application. The metadata standards this is another really important part of the data management plan. It's really important to have clear and detailed documentation of the data and because it's critical for the data to be understood interpreted and reused the data documentation describes the content the formats and the internal relationships of your data. It enables researchers to find use and properly site your data. It includes things like data dictionaries read me files, which are recorded, which are required if you want to put your data like into digital comments. It includes things like variable names and descriptions column headers, data dictionaries as I mentioned, explanation of codes and classification schemes algorithms used and file format and software version information. Are there any questions at this point. Everybody. Still awake I hope. So policies for access and sharing. And this is important to think through especially if you have any kind of sensitive data, confidential data, or data that may fall under export control. And so in this section you need to clearly identify how you will protect this information both physically and digitally if you have human subjects data that often involves locking it up. And or encrypting it funders expect you to share your research data and in many cases having sensitive data does not excuse you from sharing your data. But it does. You still have to clearly explain in your data management plan how you will prepare your data from for sharing how you will anonymize it how you will remove those confidential aspects in order to make it available. And that can sometimes be a pretty complicated activity which is another reason why we're here to help with that and why we partner with the research office in order to figure out those questions. So determining cost. Most funding agencies allow you to include the costs of managing your data and your proposal, creating processing sharing storing and preserving data is expensive. And so it's really important that you think through these costs. Early on in the planning process. And it's also important to consider those costs both in terms of human resources how many hours are going to. It's going to take to actually do the to prepare the data and to document it and as well as the fees for storage which and in many research projects, the human labor is far far more than the storage fees. We have a couple of resources for helping to figure out costs. These are some worksheets that were put together by the UK data services. And these links are also available on our website as well as on the canvas site. So any questions at this point. And if you don't mind, since we have, I think a minute before. I think we have a spare minute, Blake, I'm really curious about your experiences with doing costs estimates for grants. Is that something that you've had to do much is it. It's one of those like classically difficult tasks. Yeah, so I'm, I'm guess what you would call an outlier in that most of my funding has come from private sources, which hasn't had data availability clauses in fact it's proprietary most of the time so it's not to be shared. Well, some of it. So I haven't I have no experience actually with with cost estimates for data management and storage and so on. I was going to say though that you know with the number of years I've been doing research. This isn't related to your question now but it's just a side note. There's a lot to think about. And often I think that obviously that comes to mind is tabular data, and you know storing that in a format that I guess CVS I think is what it was called. And you know a way that somebody can use it with various software programs, but some data comes like in video format. My PhD dissertation work, I used a bunch of video, and that got archived on certain kind of video tape tells you how old I am I guess and I don't know, you know the chances of finding a player. Like a seven eighths inch thick tape a bigger tape kind of like a related to what they did for news broadcasts back in the day and she had to have a special player for that. And you know if you think you might have use for your data just updating that as technology advances say well I need to make a digital soon so that I don't lose quality and then if the digital technology change I should update those as well. But it's something that's relevant and useful potentially in the future sometimes it just doesn't you don't need to worry about it but sometimes it might be continual process depending on what type of medium contains your data. And as a side note magnetic tape like cassette tapes and video there are a classic and colossal headache for digital preservation because they are. It can be difficult to find a player it can be difficult to keep the player running because it's very mechanical kind of object. And, but it is, it's also whenever you deposit your data, whether it's with digital comments or another archive. There's almost always a clause that says, when you deposit it with them that we have the right to keep multiple copies and to reformat it as needed. But archives can migrate the formats as they go forward. And so these are all. But yeah, using open standards for formats makes it much much easier for other people to use and also to migrate things to different formats going forward. The audio one that the magnetic tape is sort of an obvious one the cassette tapes for audio recordings for things like oral histories, another like kind of obvious one less obvious is there. There are a bunch of things like real audio and real video files from the early 2000s and late 90s that all had to be migrated to more open formats because the real player. No one uses anymore and that was all completely proprietary. And so even migrating out of that format was a giant headache, because there are all of these licensing restrictions on basically, you know reverse engineering the format in order to be able to move it to another format. It's another thing to have the ability to play it. It's another thing to have it worth playing, you know those magnetic tapes deteriorate over time and eventually it's, you know, the quality may not be of any value. Well that that's what I was referring to when I say that they are a colossal headache because with the magnetic tapes if you don't spin them up regularly they start to glue themselves together and there are all these other problems with them and so from a point of view there's just all of this stuff that you have to do with them just to keep them functional that's way beyond like paper with paper you can you know with the book you can put it on the shelf and as long as the like cockroaches don't get to it and as long as it doesn't get wet it's pretty good to go for a pretty long time. The magnetic tapes are like they're one of the most labor intensive formats to keep functional and they degrade over time. So anyway, long story short that brings us to plans for archiving and preservation, which is what we were talking about with tapes and there's often issues of migrating the formats forward, but it's also that's part of what a data management is trying to get folks to think about is how am I going to collect my data how am I going to describe it how am I going to store it during the project and how am I going to store it for the long term after the project. And so, many, many public funders require a public access or have a public access requirement. And so it's important to know that if there is that public access requirement archiving your data on a personal computer, a lab computer or a server is not adequate. And so they need to go to a public repository such as digital commons or one of the discipline specific repositories. And so we can help identify an appropriate repository. And it, you know, depends on your data and that's often why we have consultations with folks is to figure out what kind of data they have and what the best place for it to go is. Were there any questions on that. On that part. Earlier, there are some tools that help put together data management plan of a DMP tool is a really nice one it walks you through the different sections. It has templates that are specific to the different funders and divisions. And so you can go in there, plug in what your funder is plug in the division, and it'll pop up a template that shows you which sections are required and gives the guidance for filling out those sections. I really recommend thinking through those 10 questions that I outlined first, because you kind of have to have a game plan before you start plugging in those sections such as the expected data or the costs or the plans for archiving and preservation. So but this DMP tool and the templates that they offer are really helpful in that way. They also provide guidance with where they walk through types of data file formats like we were talking about with using open file formats like the CSD organizing your files using descriptive file names, metadata and data documentation persistent identifiers. The nice thing about using digital commons and many of the other repositories that we recommend is that they automatically create persistent, or that we do the work of creating persistent identifiers for you. So that you don't have to worry about that part. But the DMP tool has all of has guidance on all of these subjects and it's, it's, it's well written it's really helpful to look through. But they also talk about security and storage sharing archiving, citing your data, as well as issues like copyright and privacy looking at confidentiality and things like that. And so there are links here for all of that. So, we wanted to workshop some sample DMPs. This group is small enough that I think we may want to drop down to just doing two breakout sessions instead of four is otherwise everyone will be by themselves. Let me make a suggestion. This group's pretty homogeneous you are all from a civil and environmental engineering. Would you object L if we all stayed here and we did this as one group considering the background of our participants. That sounds good to me. I would folks mind chiming in on which one they would like to do. So the first one really hasn't engineering related topic but it's it's about educating engineers. But do you do you want to take a minute to look at the four and we can figure out which one we want to do. I think in the chat folks and these will take you to the discussion pages and they will also have the DMPs if you want to peruse those real quick. That's in the chat now for everybody. So do folks want to chime in on this is the first one seem like the best for everyone person my background is a little bit different. Whoever has a preference. Well you were the first one to speak up what's your preference. I clicked on the rapid profile just quick. I was just going through Susanna French and French for paperwork. Mike around seems a little bit more familiar. So this cruising now to get started. I think it's discussion point three that helps. Okay. Were there any other comments in the chat missing. No, not this point now. Okay. So the one thing that I wanted to point out as I mentioned earlier with the data management plans they often become part of the formal grant application and so with most of the grants that we process through our office and through the office of sponsored programs. The data management plan is archived in digital commons. And so these ones are all publicly available as are most of the other ones for the research that's been funded through like the USDA and NSF and things like that NASA. A lot of the ones that require have a public access requirement the data management plans also become public and are archived in digital commons. This one is sorry because of zoom a lot of my screen is hidden. The questions that we wanted to cover for this is what kind of data is going to be collected. Where is this data going to be archived. Who is responsible for the data during the project and afterwards. What is this data going to be described for future use what's the metadata are they going to have data dictionaries, who is likely to read use reuse this data in the future. In other words, who's the audience for the data will use of this data require special tools software expertise. These are things that should be covered in the data management plan. And then what may not be obvious was actually technically not in the data management plan but it's something that if you're working on a data management plan you will need to know and is does this funding agency required the deposit of publications for this for the grant under discussion. And if so where does it need to be deposited. Does it also require the deposit of data or just the publications. And so where would you find this information and the hint for that is, it's on our website. But it's not in the data management plan but you do want to be familiar with it because if you're doing one of these you will need to know this kind of information. So, and that's where I'm going to turn this over to you all, and I'm going to be quiet. All right, well, so for kind of working on your questions here so what kind of data is going to be collected. I'm going to go back to the written description I was working through just so I was just kind of curious. Let's see. So different field measurements it looks like some blood samples. Let's assume those are going to be animal. Right. There's going to be some DNA sequencing. So kind of our primary question here so what kind of data is going to be collected is that kind of is that I mean, were you anticipating like more in depth dive into the data or what the data are for kind of like surface level like what are the general outcomes. Well, some of it is also an exercise where when you're writing a data management plan these are the kinds of questions that you're going to be trying to addressing with the document. So in reading these documents this is a chance to see how other people address them and when the funders or the program officer reads it. Those are the questions they're going to be trying to answer for themselves. And so there isn't a right answer but this is an example where you can see how somebody went to answer that question for themselves and to write down some answers and respond in the discussion section you can see what other people responded as well but you won't see it until you respond. So, I, that was a classic there isn't a right answer answer. multiple right we should do it right. But and then the last. So, when, when we get through all of them the sort of final question is do you think they adequately described their data such that you could answer these questions. But so where is their data going to be archived. See here. It looks like it's your us us so digital commons is going to be this platform. It's archived. Who's going to be responsible for the data. Let's see. I mean off hand that I, my assumption is the PI for gosh, was it five to seven years off hand, I usually longer is better back my mind. But let me dig a little bit deeper see if they have any specifics in here. So blood samples no less than five years. And that's a. A section of different DMPs to look at and this one you may notice it doesn't actually have a specific roles and responsibilities section. Yeah, and that's where being aware of what the fender is asking for is important. And it is some of that information is as intermixed I think with this one I suspect it's a small enough project that the PI is doing all of it and so it's not broken out in that way. So many data management plans asked for an explicit like roles and responsibility. And I mean, that's ideal case to, I mean, improve grantsmanship right I mean, if the funders are calling for a specific question or inquiry. I mean we can literally make that header saying here's the breakdown for this, this primary question that's being asked like make making the document really be beneficial or easier for them to read is going to really help by all parties. Yeah, and that's where the data management of the DMP tool and the like templates that it offers can be really helpful because that those templates are tailored to the different funders and so it prompts you for those different sections depending on what's required. It's really helpful for that kind of clarity. So my experience with like the American Heart Association grant. I don't so it's a unique situation me because almost all of their data you have to submit to their repository but for pre pre either pre doc or postdoc fellowships. The data aren't required, but it just, you know, it's good practice obviously so I've implemented that, but I did find interesting because they didn't have a, let's say a designated format or tool that said, you know, follow this x steps. I tried to looking back there's always room for improvement which is why I like these sessions you know thinking okay well could I do better moving forward. It would be great to even see that implemented across all major funding and agency so. And they are great for NASA FDA and then and another brand is that we aren't my primary aspects so be nice to for me to implement those tools moving across different domains. Great thanks for sharing that. And so do they talk about how the data is going to be described for future use what kind of metadata data dictionary is that kind of thing. We briefly talk about and let's see one a CSV and then the PD format capabilities scroll down here and see what else to make a reference that that'll make it easy to do using multiple platforms which is our SAS code. So this one, they break it out into a section to where they're talking about the standards that they're going to be using. And as you, you might notice, they also use that same fair language of findable accessibility interoperable and I think reusable, although some, some researchers use that final R as being reproducible, although they're reusable is what is more common. Any other thoughts or comments on that. On their metadata standards, or their descriptions. So who do you think the audience is for this data who's likely to read use or reuse this data in the future. Fellow collaborators researchers of similar field overall. I don't see this, I don't see naturally community members per se, you know, by accessing this directly. Yeah, and part of the reason I asked this question is, because that is that anticipated future use is one of the hardest questions to think through both as a researcher and as someone who's managing the data. But it's often really important for two reasons. One is because it makes the application more competitive if you do have it well thought through. And two, it's often in order for other people to be able to make effective use of your data. It's often critical to have thought through that. And so the, the best projects. The most highly reused data have thoughts through that question really effectively. And so, but it's one of the harder ones to really do is to think about okay, who is the audience for my data how should I package it so that somebody else could use it. And it's, it's in some ways it's very similar to authoring a publication where you think through your audience and then, you know, sometimes you're authoring a publication towards multiple audiences. And it's very similar with the data and it can be a really challenging complex tasks. But for the next question will the use of this data require special to software or expertise. Do they touch on that and this dmp kind of data for me was they said they had CSV and then DPT so that could be easily used in our SAS code so let me they at least offer two different variations of accessing data and to comment. So just packages to help analyze it. I don't know if there's any other special tools or expertise. I mean, the challenge of this data person stands out as I mean you have to be pretty comfortable with DNA sequencing. What is microbiome research and so on so forth like I think it'd be very challenging for someone who didn't have tailored training expertise it with these specific outcomes to really dive in. I've spent six months helping to pilot a microbiome project with older adults and the clinical study. I feel pretty comfortable with that data. I doubt I would feel comfortable trying to dive into this. I gave me at least some background I have a little bit of biology but I think clear expertise training is definitely going to be necessary to represent the data correctly. And that's one of the things that I would also point out is that in many of the funding agencies that program officers work across a pretty large swath of grants. And so it's important to be able to write like it's when you're doing your research it's really easy to kind of have those blinders on where you're speaking very much to your colleagues and doing a very kind of focused type of language and it can be really a challenging exercise to include those aspects of it that are sort of invisible. And so this is in this case, some of that expertise is a little bit invisible, because it's not obvious the expertise you would need to be able to deal with those DNA sequences. I think it's pretty well written but it's a, I asked the question, because it's an important aspect to think through and when you're writing a grant application you have to remember that the person who is reviewing it may be from your domain they may not be. And so it's, it's a tricky document to write in a lot of ways. And it's often only two pages. But anyway, so last question. Who knows where to find this information about this one is an NSF grant isn't it, which one, which funding agency did it come from. This is an NSF grant, yes. Yep. And so where do you find the information about whether or not NSF requires the data hand or the publications to be made publicly accessible and do they require publications, data, both. Oh, well, going back a couple of minutes when you introduced these questions the first thing that came to mind when you, you know, I said this last question was for me when I think of like, a chair in NIH I'm thinking about what's listed in the actual funding announcement. Usually it's very clear what aspects are going to be required or what to expect. So I think even, you know, when you're still contemplating the idea and concepts you should be aware of what's going to be required if funded. So I feel like that, that information is pretty easy to access right. So you can go to the primary agency so NFS, sorry NSF in this case. You know, for people who are you watching this later right, you can reach out to you all, you know your primary is our primary resource to help us make sure we are doing our due diligence to follow through and make the data available. That's all absolutely true. And there are also a number of researchers on campus who have multiple grants, and most of the time it's with the same agency but some of them have multiple grants within different divisions. And it's actually not as easy to keep track of as you might imagine which is part of the reason why we're here and part of the reason why we do consultations and part of the reason why the DMP tool was developed so that people could grab those templates and plug things in. But the short answer is, this is also why we collected an updated on our website. Because so for the NSF for example, they require that data be published in conjunction with the article and they also require that any publications that come out of the research go into their public access repository and so putting it into digital comments is actually adequate for NSF. And every different agency has its own requirements and so you can see here, you know that they have, you know there's USDA NIH, you know NASA that they all have their own. And these are the three big ones and then we have all of the other ones listed as well. And so and we work to keep them up to date so this is a nice landing page where if you just need to check and see what's required for your grant, you can come here. You can also dig through the NSF web page. But in my opinion it's a lot easier to just go to our web page. Elle's right we've tried to extrapolate all that you know pertinent information and put it somewhere very easy for you to find so you know use the website that we put together it's a really good resource you know. And if you have questions, as I'll said, send them send me an email and we're here to help at our DMS. And so the last question we're coming up on the end. Put that back into the. So would you recommend the project based on the DNP that you read. So it seemed like it's solidly put together like they have a plan for their data. I mean I would like a little bit of clarification for like DNA sequencing, the clear plan of who is managing the data so in this case if it's not project the PI is probably doing so, but just a little bit more clarity so somebody who may not have, you know the idea expertise or, you know the close knowledge of that project can feel more comfortable walking in and clearly understanding their exact steps. I give it up get a solid B. That's pretty fair Sarah this grant actually was accepted in the researchers here on campus working on it currently. So that's the other thing is in all of these links that we provided in the sample DMP is our ones that were from their active research projects and so they were all funded. There are a number of other ones that are available in digital commons and so if you figure out how to get to the funding records you can go through and look, you can look through a number of them. And if you're curious, send us an email and we can point you at some that are more closely related to your research area if you'd like. Yeah, that's for free specifically or anyone else at a later date. And so we're almost out of time. But were there any other comments that about this particular DMP. Any questions that you had after using it things that you thought were particularly well covered or things that you thought were missing. No, not this time. I mean this is great. I mean, I'm now familiar with more resources that are available to me. So that's awesome. I think you guys are putting on the work to make the successful to all of us appreciate it. I will definitely reach out just for probably a little more tailored towards my specific research to see how, how I can improve my plans moving forward. I'm going to try to apply for a K grant coming up so there's some things I can think about of, you know, what I'm doing now and how I can improve these moving forward. And we're happy to help and it's, it's actually the more that people think through these issues like early on in the data management plan like process and before they're well into their data collection process. So this year it is for us to help them archive it as well. And so that's part of the story as to like why the funding agencies have been pushing these data management plans. The other thing that I wanted to reiterate that the data management plan is part of the grant application and so it becomes part of the proposal agreement. And so if you state something in your data management plan. It's something that you then get held to later. And so it's important to remember that. To help feel free to contact us our email addresses here as and you can also contact us via our website. We are out of time I wanted to, as many of you were here yesterday, and we have resources available both in terms of tutorials and recorded training sessions. And please, I think, Mike popped it into the chat. Please take the time to fill out our feedback questionnaire. And let us know what you think of the session and or future sessions that we're planning on doing this year was particularly scaled back because of COVID. And we would like to offer a more fully featured program as we did last year. And so, but again thank you for taking the time to participate we really appreciate your input and I hope that this has been helpful. It has thank you for putting this together. I'll bring it up. Appreciate it. I'll send you all an email here shortly. Great. Are there any other comments or really did have a specific question that was outside of the realm of DPM is that you wanted to ask you L. He's a PhD student over with a civil and environmental engineering he's working at the UW RL I believe that's the water research lab. And his question is, we've got us you box except for box what other platforms are available for data organization, or is box good for a good enough for you know us here on the campus community for data organization and sharing. So that's a complicated question for this is one of the reasons why we often engage in consultation with folks because it depends on what kind of sharing you're talking about. And it depends on what kind of data and but for many applications boxes is fine. Campus it has been getting a little has been engaged in conversations about large volumes of data on box and so there's some question about whether or not when it was originally rolled out. Some folks on campus were told that it was that you could put an unlimited amounts of data in box and that's obviously not true but the campus it is still figuring out solutions for people who have very large volumes of data. So that's one of the reasons why it's often a consultation issue and we've worked with campus it in terms of figuring out solutions but for most applications box has been fine. There have been some issues where people have been running multiple versions of like box drive and box sink and it's caused issues in terms of things disappearing basically but again it depends on how many machines it's considered as a fairly secure system. But again that's where consulting with us can be helpful to think through those issues of confidentiality and or other restrictions that might need to be put on the data in terms of public access requirements. It's not adequate it needs to go into a public repository. And so again it really depends on the nature of the data. Mary did you want to comment on that Blake that I misspeak and talking about where campus is going with that. No, no. A couple things and I'm not L you probably know more about it than I do but you know, box is a paid for service and at some point we may be come too expensive and things might move to somewhere and so I don't think that's happening the next six months or maybe not even the next five years but I always want to be looking you know as you get an archive thing I was going to say, really specific to you is Dr. Torres and Aggie air are accustomed to large large data sets, and actually they might actually be more of an expert than any of us on on where that's being archived and how to do it effectively, you know with all the video imagery that they record in their sites but yeah, that might be something you could bring to these discussions and help us understand even better as Aggie air figures out figures out the best way to archive their large data sets. Yeah, and we, we strongly recommend having an offline backup copy of data just for that reason because it's a commercial service and it might get moved at some point, and you want to have an offline backup because with those kinds of sync services if there's a corruption. It gets propagated through all of your data as it sinks. And so you want to have an offline backup copy of it with box that's a little depending on how you want to do it. It's complicated, but we have, but we can make recommendations on how to do that so feel free to reach out for us, reach out to us, and we can work through that as well. And just to be clear, hopefully I didn't set up any red flags or anything we're not leaving box there's there's no discussion of that is just that as things go, you get a custom and then the price to start going up and at some point you have to do something that's always a possibility down the road but there is no no plans for that. Yeah, and I didn't want to to sound alarmist either I'm speaking as a somebody who's done a lot of work around digital preservation and so you always want to have multiple copies and so that if anything goes wrong at any point that you have a backup copy it's just one of those things that digital preservationists always recommend and we talked touched on yesterday briefly with the 321 having three copies on two different medias and and so having it on box is great. But you probably want to have an offline backup copy as well, just for just in case anything goes catastrophically wrong. Were there any other questions that didn't really have any follow up questions to that. It's a pretty general answer. I guess some general commentary I recently went through that as well for a observational trial that we recently wrapped up and had our close out. It included 48,000 files and over three terabytes. So we were arguing was initially under the impression that we had a unlimited box file right so as I was doing the final backup on box to realize we had maximized the current storage. So we had to come up with some alternative approaches for online based storage and basically versus our, you know, it's like terrified drive per se. And the other one that the other like the sort of catastrophic thing that comes to mind I mean there's the limitations of the commercial software. There have also been a number of problems in some institutions with ransomware and devices being encrypted and so having an offline backup copy if your device like goes down hard or if it if your network gets hammered. So having an offline backup copy can be a really critical to getting things back up and running with a minimal effect to your data and your research. So it's not a, I don't like to wave the alarmist flag it's just good practices to have an offline backup copy, especially if it's critical to your research. It's just, it's a basic data management issue. So getting into terabytes and terabytes of data can be much more complicated, especially if it's up on a cloud server where you have to, you know, download all of it and it takes two or three days to download. So we're happy to help and please approach us if you have any questions about doing it and we've worked with campus it and library it depending on where it needs to go to figure out those problems and we're committed to making sure that all of that is functional for researchers and we're aware that some of those issues with it can be pretty complicated. And so if we, if you can't get it sorted out please let us know so that we can advocate for you with campus it and with the campus research offices. Well, great. Were there any other questions, comments, concerns, thanks again for the feedback and the participation. Thank you well, and Mike, and thanks everyone for participating. So logistically we're, this is being recorded we're working on getting the videos uploaded to canvas so they should be available in a couple of days assuming that that'll go smoothly. And we should the content is already available in the modules on campus on canvas, but we're also going to be uploading the slides so this should all be available for anyone who's doing it asynchronously, or if you want to read this at it. Thanks again, and have a great afternoon. Thank you. Thank you everyone.