 So as I said, my name is Christy Peters. For those of you who may not be familiar with me, I am head of the science and engineering library here on campus, as well as eScience initiatives. So I do a lot of data management related stuff in the course of my daily work. This is the first of, okay, that's not working. Okay, this is the first of nine workshops that this that the research data and scholarly communication committee here inside of the University of Kentucky libraries is offering this spring. This is very much a general overview. So those of you who have a pretty good understanding of data management or who participated in the workshop that we had Held here in the library system last February. Some of this may be review, but hopefully there'll be some new content here too. It's not There will not be complete overlap. So you see before you the list of future workshops. All of these were included in the email that I sent out to all of you. If you are interested in participating in any of these either virtually or in person, please register. We asked that you register So that we have a record of how many participants, there were how many people are interested in that sort of thing. Okay, so my objectives today. This is really going to be divided into two general parts. I am going to provide an overview of some basic terminology that I think you commonly hear associated with data services and libraries. But that you might be a little confused about. So I will try to provide some clarification of those terms. I will ask you, you can certainly ask me questions throughout the presentation, but I will stop after I finish that first half. And then we'll go into sort of why it's important to manage data very generally what why should you care. I will briefly go over UK's data retention policy. Talk a little bit about federal data mandates. A little bit about the kinds of services that academic libraries are especially research libraries are offering In the area of data management services. And then for the activity. I do hope that there will be time for me to both get you logged into DMP tool and create an account. I think that's the first step to actually being able to use it and talk to people about it. And then there's a case study and I brought copies here. I sent all of you Who are participating in person and virtually a copy of the case study. It is a very science based case study, but I have modified it enough that I think it's pretty simple and easy to read. I didn't want to give you something that was super long. And we will talk through the the data management plan prompts for this case study. And I think it will help you sort of get a sense for Data management and practice right like what do I do when I sit down with a researcher and talk to them about how to manage their data. I think actually seeing a case study and talking about these terms is way more beneficial than just talking about things in a very abstract way. So again, if you have questions, please feel free to ask either virtually or in person throughout the presentation. Okay, so I think many people when they think data management, they think about big data right because big data is in the news, whether you work in data services and an academic library. Or not. I think you hear this term, but it's a hard concept to kind of wrap your brain around a lot of people assume that the researchers that we work with On campus in my capacity with my data management responsibilities. I'm helping people with big data. That's not really the truth. And I'll explain why, but I thought I would sort of discuss big data in general to begin with. So here are a few examples of big data on social media. Social media digital images with the metadata OCR sort of embedded within those images web server web server logs. These are all examples of Big data which tends to be characterized by a number of the words for those of you. Sorry. This is a glitchy computer. So it should never blink out for long. But the three V words that I think rise to the top of the list. Most often are volume velocity and variety. So with big data. We're talking about tremendous volume of data. A petabyte or more just to give you some context a petabyte of data is a million gigabytes. So the flash drive that I brought my presentation into the classroom on is an eight gigabyte hard drive or eight gigabyte flash drive. It would take 125,000 Of those eight gigabyte flash drives full of information to comprise a single petabyte. So that's sort of the scale of data. We're talking about when we're talking about big data and with with social media. Especially that's a great example. We're talking about petabytes of data that are being generated every minute of every day being transmitted very rapidly, which gets to velocity. You know, these things are created immediately. They are stored. You do analysis on visualization. All of this stuff happens just Faster than you can blink your eyes. So that is another characteristic of big data and there tends to be a great deal of variety and big data. So the kind of data that we're accustomed to working with in the library, for example, tends to be structured. Right. It's the kind of data that fits in a relational database or that you work within a spreadsheet. It's very ordered. It's quite easy to analyze. We don't have to worry about outsourcing that kind of analysis. But big data is often comprised of unstructured data as well. Social media, digital images, web server logs. These are all examples of Big data that are unstructured, not the kind of data that we're used to dealing with. So then the question in a library context is, you know, do libraries are we really in the business of dealing with big data ourselves. And I would say that we're sort of getting there. Right. Not really not yet, not as a complete community, but some libraries are becoming more involved than others. So link data is a great example and I decided Catherine, our very own Catherine Liebarger was just at a conference and was tweeting about link data. I think this might be a great example of the workshop we could do in the fall. Perhaps on link data. But anyway, link data is all about enhancing the web through the addition of structured data. RDF is how we're doing that. This RDF stands for resource description framework. In terms of the benefit for the libraries. It's all about exposing are the metadata for our records to the community. Right. So they can find our materials when they do web searches. They don't have to know to look and the website for the University of Kentucky libraries. Right. They can find this information. Very easily if they're doing research in Africa. Right. They do a query And it matches something in our collection, they would find that and that would be very beneficial for them and it would be very beneficial for us as well. The Library of Congress has been working to provide RDF presentations of its authorities and vocabularies to help libraries in this regard. Some libraries are further along than others. It's not a priority all the time for libraries. We have so many other things to do, but it is something that the libraries are beginning to work on. Howdy trust and chronicling America are examples of big data level in my mind projects that libraries are undertaking in collaboration with other institutions. So howdy trust is a collaboration between academic and research institutions. They have over the years digitized millions over 11 million. I know it could be way more than that now on books. Right. So you're talking about big data and in terms of it being More unstructured than the data we typically work with. It's certainly very large in volume in terms of velocity. I would say the velocity really gets at the solution for researchers and how you deal with Howdy trust and even chronicling America material. So howdy trust developed it was a grant funded project, something called the data capsule. Right. So using the data capsule researchers can if they want to do computational data mining of howdy trust materials. All they have to do is Plug in an algorithm into this platform. Right. The algorithm goes into the howdy trust system. It crunches the numbers. It spits back out the result. But the researcher never actually interacts with any individual items. So they get around copyright restrictions. Right. So that's a problem. It's incredibly difficult to get copyright permissions for even one or two books. Sometimes you certainly if you wanted to do data mining of 5 million items in the howdy trust repository, you would never be able to Get copyright permissions for all of those, but you don't have to using the system. So this is what I consider a big data solution for You know, a library sort of led big data project chronicling America, many of the people here in the library are probably more familiar with even than I am. This is a website associated with the National Digital newspaper program. It's a partnership between the National Endowment for the Humanities and the Library of Congress to develop an internet based Database of US newspapers. So I know there are over five that five million pages in the system, perhaps even more than that now. And researchers use this database in a very similar way they can go in and do data mining and different kinds of projects on that content. So these are by no means in all libraries projects of this magnitude, but I do believe that libraries are sort of starting to get into the big data business. So research data is defined many ways, but one definition is the recorded factual material commonly accepted in the research community is necessary to validate research findings. So this is going to vary greatly by discipline. Right. The research data for a historian is going to be very different from the research data for a civil engineer. Right. So this is an area it doesn't tend to be big data. While there are certainly researchers on any research university campus who do projects that are have big data associated with them. In my experience, the researchers who need the most help with data management deal with much smaller scale of data. It tends to be much more structured, you know, much more easy to manipulate, but they just don't really know how to go about managing their data properly. So this is an area that where I feel like academic liaisons really have an important role to play. While I as a librarian who provide data management services. I feel confident I could sit down with a researcher in any discipline, regardless of my knowledge of that domain. Specific information and help them with data management problems, because I think understanding of data management and how it works and the kinds of Issues that people have to deal with is honestly more important than how well, you know, the content knowledge and that discipline. But there are things that are very specific within disciplines. For example, When we gave our data management workshop last February, I learned, even though I have a degree in the life sciences that metadata in the life sciences is called annotation. I had no idea, right? If I had been working with a faculty member talking about data management and they mentioned annotation, it would have been a very different thing to me. I think we would eventually get to what they were talking about. But this is an area where I feel like academic liaisons, even if you don't plan on ever being the point person for all things data management in your field. I think you would serve as a really important mediator between the researcher and the person providing the data management services by having sort of that domain specific, discipline specific data management. Information at hand. So, most data management services and academic libraries are very much a collaboration between whoever in the library provides data management experience for all things data management in your field. I think, you would serve as a really important mediator between the researcher and the person providing the data management services by having sort of that domain specific discipline specific data information at hand. and academic libraries are very much a collaboration between whoever in the library provides data management services and academic liaison so it's very rarely one person who does it all for everybody. So that said that was research data very generally there are different kinds of research data so observational data this could be sensor readings it could be survey results librarians we love surveys right. Experimental data is going to be much more prevalent in the sciences and engineering maybe in the medical fields simulation data I think people across disciplines you create models for forecasting and things of that nature derived or compiled data this is data mining the Hottie Trust example is an example of a way that you would go in and do a data mining kind of project any given project could include all of these different types of research data but it's important to understand the distinction because the way that you manage data really depends on the type of data you're trying to manage so if you're sitting down with a researcher and you're talking to them about their research data management needs it's important to understand that there are different kinds of data to ask them typically what people in library systems do when they sit down with a researcher is very much like a reference interview right you're asking different questions perhaps but you're trying to get at the specifics of the the research that they're working on the issues that they have because often just like with the regular reference interview the people who you're helping don't necessarily know what they don't know they don't really always understand what they need but they know they need something so it's all about sort of going in and asking them enough questions to sort of tease out that information so data management very generally relates to the actions that contribute to effective storage preservation and reuse of data and documentation throughout the research life cycle so it's everything from conceptualization and creation of a data management plan to sharing your data once you're done with the project so the research data life cycle you may have seen at conference presentations or you know they're all over the place there are a million different kinds this is just one version that I like to use so research is never this linear you know I understand that the arrows can often go in many different directions you might be working on all of these different aspects of a project at any given time but I do think that this is helpful in just conceptualizing how a research project works and it helps sort of lead a conversation when you're talking with a researcher about their research data management needs so it begins with project conceptualization with creation of a data management plan which researchers have not always thought about when they begin their research but funding agencies are now requiring that they submit a data management plan with grant proposals for very good reasons and so now they're trying to think about that at the very beginning and that's often when they have questions for us so then we've got data collection and assurance so this is just collecting your data processing your data which could be cleaning the data so that it can be analyzed data analysis and everything that entails determining what data it's important to preserve and archive long term because it's very likely you don't need to preserve all of your data associated with research project for long term storage and archiving but faculty they are researchers they don't necessarily know what data it's important to keep publication and data sharing and data reuse these are all aspects that need to be addressed in data management plans so it's important if you think you are ever going to interact with faculty about research data to at least have a general understanding of what these different aspects of data management life cycle entail and storage and backup and security you know they are important to consider throughout the research data life cycle storage and backup here I'm really referring to short-term data storage and backup as opposed to preservation and archiving which is long-term preservation of research data so this is something that I've been thinking about for a while in my mind there is a divide between the the part of the research data life cycle that researchers non-library researchers are comfortable with and familiar with and what librarians and library staff are comfortable with and familiar with the researchers all about getting the grant money and getting the project done right they're very focused on the front end of the research data life cycle you know they want to go out they want to collect their data they want to publish those results so that they can get credit for them because that's sort of what they survive on right the back end of the research data life cycle falls much more under the traditional umbrella of library services so archiving and preservation publications and sharing data I mean data is perhaps a new concept but these are still traditional library roles so we tend to be much more comfortable with this aspect of the research data life cycle I feel like you know there's a learning curve on both sides researchers obviously need to get more familiar and comfortable working in the last last half of this life cycle and it's important because you need to understand how you plan on archiving and preserving your data and how you plan on sharing your data and the provisions you're going to make for reuse of your data you need to understand those things when you create your data management plan before you begin doing the research so it's fine to come to us once you're done and say you know I need a place to keep my data but there are things that you would have done perhaps differently in the front half of this research data life cycle had you known that you needed to alter you know perhaps the metadata that you are collecting you know maybe you need to collect certain metadata in order to be able to share your data in a certain way or to be able to allow for people to reuse your data in a certain way if you don't know that when you do the project then it may be too late right for you to go back and retroactively collect that information in librarians I think in library staff my sense is that many librarians and staff are very uncomfortable with research data services and I think the reason for this is that there is an assumption on the part of us as librarians and staff that we really need to understand the discipline right we need we don't have a PhD in civil engineering or in history or in anthropology or whatever discipline we're providing support for so who are we to presume that we can advise somebody on how to properly manage their data right and like I said at the beginning I really don't think you need to have really in depth domain specific knowledge especially if you've got liaisons right who really have much easier access to that information if it's a collaborative project and you're working with one another it shouldn't be an issue but even if you're not you know I I don't have a degree in civil engineering but I am confident I could sit down and help somebody in civil engineering work through their data management problems I think it's just a mindset thing I do believe though in order to do that you have to have a general understanding of research data the research data life cycle and the kinds of issues that are attendant with that so you know I think that's one reason why it's important for all librarians even if you never plan on being a data librarian to sort of have a very general understanding of research data management and that gets me to this is another soapbox I guess but I will just say that as librarians in a research library we are faculty as well right we're expected to do research in our own right so my answer to librarians who tell me I'm not a data librarian this doesn't matter to me I really don't need to understand research data management it's not important to what I do I would argue that it is important to what you do if you ever plan on doing any kind of research in your own right and publishing it because in that respect you might not be providing advice to faculty or to graduate students outside of the library setting but it's still important that you manage the data associated with your own research you know for the same reasons that it's important for faculty outside of the library to do so so components of a DMP which is you know an acronym for data management plan this is based on NSF NSF was really the leader when it came to requiring data management plans with grant proposals in 2011 January 2011 they began requiring a two-page document no more than two-page document that outlines how you are going to manage the data associated with the project so it includes types of data metadata standards data sharing access reuse archiving as you can see these really do sort of align with the research data life cycle that I talked about a little bit to begin with and some of these categories you would assume faculty would be really knowledgeable about like metadata standards for example anybody who does any kind of research project generates metadata right it's just inherent in gathering information you can't gather information and use it if you don't create metadata but they don't necessarily understand metadata as a concept so it's really sitting down with them and helping them understand yes you're already capturing much of this information but you're not necessarily doing it thoughtfully and you're not necessarily doing it with an eye towards how you're gonna be sharing your data once the project is done so in many of the cases faculty and researchers they already know some of these things it's just putting them into context and in the terms of a data management plan so those were some of the definitions that I wanted to go over do you guys have any questions about anything that I have talked about either online Robert's looking online or in here do you need clarification sure okay okay so Jamie Burton said preach so thank you so much for that I appreciate it I'm not sure what that was related to but we'll just assume it was for the whole thing all right so if there aren't more questions I will go on and start talking about why just why is it important to manage data very generally so there are many reasons actually I would say in this whole big long list of reasons why it's important to manage your data human error certainly is at the very top of the list there are so many ways that things can go wrong when managing data many of you have used flash drives or even external hard drives it's very easy to lose storage devices right even if you haven't lost a storage device they are corrupted very easily I worked with an education faculty member at a previous institution who no longer had access to her PhD data because all of her PhD data was on an external hard drive that was corrupted and she had worked with people in IT in her department she had nobody could extract the information from that material there are PhD students who lost years of research when their laptop was stolen or when they lost their laptop and who's going to be able to redo that that could effectively end your career right as a graduate student at least because you're not going to get the funding you're not going to have the time to go back and rerun those experiments so there are just so many ways that things can go wrong that are operator error and even not operator format obsolescence right formats data formats storage media they're changing all the time so you may not be able to go back and read that information natural disasters there are many instances of people who have lost data because of natural disasters be it on a small scale or on a large scale where entire servers you know were destroyed so you know there's a big long list it could go on and on it's just very important if you do not want to lose your data forever for whatever reason it's really important to manage it properly to store it properly to back it up properly these are all considerations that you have to think about all the time so another a few other issues when I'm working with graduate students and when I'm working with faculty and talking to them about data management that I emphasize include retractions right journal with article retractions are on the rise so people are in question are increasingly questioning the results of studies in this case there was a paper that was retracted because the authors were not able to provide the original data associated with the project so if somebody accuses you of using fraudulent data or if somebody simply questions your results they they don't believe based on studies that they've done that are similar that your results are valid if you can't provide access to that data then you know there's no way you can justify your conclusions there are a lot of faculty who will tell you that they are sharing data when they publish a journal article but I would challenge anybody to replicate an experiment based on the data that you find in a journal article right it is just not something that is likely to happen and it has to be usable so simply holding up a flash drive you know or an external hard driving saying my data is here it's not enough like can people make sense of your data have you included a data dictionary or a read me file that explains what all the variables are what equipment did you use what version of equipment did you use these are all this is all information associated with data collection and data management that it's important to be able to provide access to because otherwise people are not going to be able to try to replicate your experiments this is simply a few charts that show the journal retractions are on the rise so from 2013 to 15 you can see every year there was an increase in the number of retractions the percentage increase in 2015 was 37% so this is something that I really emphasize especially to graduate students because your credibility is on the line right I mean there are a lot of reasons why a retraction is bad news whether you're a faculty member or or a graduate student it's also possible that new discoveries can be made with your data even if in this case the new discovery I believe was made by the people who actually captured the images with the Hubble Space Telescope but many people use data that other people collect for reasons that they never could have anticipated right so in this case 11 years after this image was taken they were able to find new information embedded in those images because they had access to equipment that wasn't available when the images were first captured if they had not properly stored and backed up and preserved and maintain the integrity of these images that would have never been possible right I have been to presentations of faculty who work in the sciences but don't have their own lab like a hundred percent of their research is using data that other people collect so it's really important to manage that data properly and and to be thinking you know how might people use this data because it's whether or not it's usable really depends on how well you have managed the data throughout your project so some general benefits to the researcher include increasing the impact and visibility of research this is important to both faculty and students it promotes innovation and new data uses it leads to new collaborations I really pushed this to graduate students because they're all about building those networks maximizes transparency and accountability so this gets to the journal retraction right it gets to compliance it's increasingly important that you be able to show that what you you really did what you said you did and that your results are valid so issues for researchers creating this data management plan you know so NSF started requiring that data management plans be included in 2011 so that's like six years ago now I guess initially NSF did not actually score the data management plans when they determined whether or not to award a grant that's changing they gave researchers some time right to get used to the process to get used to the system it is now really becoming more of an issue because they are being graded on those data management plans you have to actually be able to do the things you propose to do this is a problem early on when this rule first when this mandate first happened a lot of libraries were providing sort of boilerplate language that that faculty members could use right except super convenient to go in and say here's an example of what you would say on how you want to store your data but what we found is that researchers were going in and copying and pasting that boilerplate language into data management plans without understanding what they were saying they were going to do right so you know just because it was a matter of course they had to submit the state of management plan they didn't really care it was just checking off the box right so it's really important now you know researchers are getting these reviews back saying you know this is not possible you have got to be able to propose something that is a viable solution for managing your data managing data workflow there are librarians we aren't doing this here but there are libraries that actually embed librarians into research groups and they really provide a lot of guidance on the entire data management process right that's an investment that's a resource issue but there are libraries who are doing that using data management best practices primarily I do teach this to graduate students and in my experience graduate students really understand the need they are in a lab setting oftentimes where they are given you know maybe responsibility over one aspect of a project and maybe they're told how to manage the data associated with that one aspect of a project but you know they are expected when they leave to go off and become faculty and researchers in their own right and manage the data associated with entire projects and they are not in most cases given the instruction on how to do that right which is why we have so many bad data management practices it's pervasive across disciplines so this is an issue they recognize it's an issue but they don't know researchers and faculty are notorious not just for data management of assuming that graduate students know more than they do right so it's it's not just a matter of educating the graduate students on best practices it's educating the the faculty and the researchers that their graduate students need this instruction that is one major area of outreach I think for people in data management services creating consistent metadata as I said earlier faculty don't really understand not all of them I've talked to some faculty that really had a tight understanding of metadata but many of them don't really understand metadata as a concept in terms of data management how to plan for it at the beginning of a project so there are many libraries that have metadata librarians which is traditionally a back-of-the-house kind of service who are now becoming much more front-facing right they are providing metadata services as a public service as a part of a comprehensive suite of research data management services so that's becoming much more common I'm sharing and getting credit for their work just telling them how they can do this and why it's important documenting compliance which is increasingly important both institutionally through funding agencies that sort of thing so very briefly the university of not every institution has a data retention policy so we should be very glad that we do it tends to be very vague as most of them tend to be it is currently under review so I am chairing a university-wide group that was tasked by the office of the vice president for research at reviewing the data retention policy and making recommendations we've done that in this semester we're in the process of revising this policy so it will probably be changing hopefully by the summer but for right now the essence of the policy is that UK owns data resulting from scholarly activity undertaken at UK this is something graduate students really don't understand a lot of graduate students assume if I do the research this data belongs to me right and that is not in fact the case and it matters because it's important especially for their faculty advisors that they maintain that data once students leave and I have talked to faculty who had millions of dollars worth of research grants who could not tell me if he had all of the data from all of his graduate students the data that they produced so it doesn't mean you can't take the data with you but you have to leave a copy of it here right so that is something that it's very important to inform students and researchers and faculty about the retention period here at the University of Kentucky is five years but we do make sure to let people know that if you are being funded by an agency that has a longer period like seven years you defer to the longer period so there might be a project that has multiple different retention periods you always defer to the period that's the longest to make sure that you're in compliance with everything data must be retained sufficiently for reconstruction and evaluation which gets back to simply handing over a hard drive with raw data on it with no documentation or explanation of the process that was used to analyze your data that's not that's not really sharing your data right if it's not understandable you're not sharing your data and the investigator is responsible for retaining or ensuring retention of the data and providing access to it so this is tricky at an institution like the University of Kentucky where we don't really have infrastructure to provide a place for all faculty and researchers and graduate students to put their data so one of the roles that we take upon ourselves adrian does this i know i've done this and robert's probably done this too is to help researchers find an appropriate place to store their data there are plenty of options out there some of them are free some of them are not but there some of them are discipline specific some of them are very broad and generic that is one area where we can certainly offer support now and you know maybe this will change maybe we will end up with data infrastructure here on campus that will allow for some of this data to be here although i will say UK knowledge it is possible to use UK knowledge for some data if you're working with you know petabytes or terabytes of data maybe not but like i said many researchers deal with data that's much smaller in scale and in that case you know adrian certainly could talk with them and tell them if it's possible for them to keep it here so federal mandates i've already talked about a little bit this was all driven by a mandate coming out of the white house office of science and technology policy to make publicly funded data available which i guess is becoming an interesting topic today right but we won't i won't go there and that's what really prompted funding agencies to start putting these mandates in place so NSF as i said it was an early adopter 2011 that's when it really started since then as far as i know all federal funding agencies now require a data management plan of some sort um the requirements are different which is why dmp tool which i'm going to show you in just a few minutes is very helpful because it takes the onus off a view of knowing what all the specific requirements are for all the data management plans because it provides sort of a template for all the prompts that all of these different funding agencies ask you to answer these questions um so well managed publicly accessible data is important and able scrutiny of research findings for your own edification for transparency for compliance for protecting yourself right which is important as well it encourages the validation of research methods it reduces the cost of duplicating data collection and it just saves you time so you don't have to go back and try to frantically gather information you maybe didn't gather to begin with it provides resources for education and training you know there are just many reasons that i haven't even gone over for why data management is important um but those are a few so in terms of data services and academic libraries i think this is my last slide before we start doing stuff um i i tend to think of them as falling into four general categories so providing faculty support for example i'm providing data management plan consultations which we do we haven't really pushed it as a service because there's always the you know what happens if everybody takes you up on that offer right i think libraries all around the world deal with these questions but that is one example of support for faculty educating graduate students i think is really an important role for all academic libraries because they really do not get that data management training in their curricula there are some differences like you pin i think pin requires phd students in chemistry to take a chemical information course that goes over data management best practices right but it's very ad hoc it's very sporadic it just depends on a particular department or in a particular college in a particular institution understanding the importance it's certainly not pervasive developing human infrastructure which is kind of what this is right i'm just providing some opportunities for librarians and staff and graduate students who work in libraries who went to graduate school um at a time not that long ago really when this kind of training just wasn't available and i think that's where some of the nervousness about data service comes from so that's what we're trying to rectify with these workshops and then data infrastructure some libraries and some universities have it some don't we don't really have good data infrastructure here but like i said you know maybe that'll change oh i will point out before finishing this part there is a research data management at uk research guide many of you have already seen this but i wanted to point it out again while it does have my name and contact information on it at least on the front page this is actually now being managed by the research data and scholarly communication committee um so i am not the only person responsible for maintaining this page um i will show it to you very quickly if it allows me all right so is i'm gonna have to share this okay hang on just a minute there had to be one challenge right it's going way too smoothly okay okay so this research guide has just some it's not overwhelming in terms of the amount of information but i do think it has information that will help it's got some getting started information it does have a link to dmp tool which we're going to go into in just a minute but if you're working with faculty or students you don't have to remember the url right you can just go to this research guide and the link is right there i've also got a page with data management and from support information not just for us so this has all of our information mine adreans roberts karla who is our copyright expert and can provide a lot of expertise in this area in the social sciences but i've got information on the office of the vice president for research i've got information on irb training some graduate students actually suggested that i put that there in engineering so if you ever have ideas about content that we could put on this guide to make it more useful um please let us know anybody on the committee can bring it um to the table you kid i'm gonna have to change this it's you kid now right not you cat so um some really general grant requirement information some basic information on data types and metadata there are a million different metadata standards but here are some of the more common ones if you're working with people and you just can't think of where to start data repositories a few links that will help you look for different data repositories in various disciplines and then some general information on data sharing and reuse so that's just for your use you know it's handy if you're at the reference desk perhaps and somebody comes up and they have a question or even if you have a faculty member and you don't deal with data management on a daily basis this will maybe refresh your memory so now are there any questions before we get into dmp tool robert nothing okay everything's perfectly clear um okay so dmp tool for those of you both here and at home please log into dmp tool the url is here on the screen um it's a pretty short you or you can even go to that research guide if you're already there and click on the link maybe i didn't link it okay so once you get to um dmp tool you want to click on getting started when i'll log in my my login my um dashboard's going to look a little different than yours but go ahead and click on getting started choose your institution so we are an institutional member so scroll down to the University of Kentucky and it will ask you to log in with your link blue information so you don't have to learn another username and password okay once you get here um i don't think you're going to have quite as many options across the top but what it's really important for you to know both for yourself and for any faculty or graduate students who have a link blue account can use dmp tool and you can create new dmp's and i'm going to work you through this process and we're going to talk about this case study because time is an issue i'm not going to ask that you actually create a data management plan which is what i would do if i had the time but this will at least show you how to go through the process so if you click on create a new dmp it will give you the opportunity to select the template and like i said there are many different funding agencies right not just federal funding agencies but private agencies foundations who provide grants it would be impossible for anyone person to sort of keep up with what the data management requirements are the nice thing about dmp tool is it the people who manage it on the back end keep all of these policies updated so nsf it's actually changed a little i found i had to change some of the documentation in one of the handouts because it's changed a little so if you go down and click on national science foundation you will see that there is a link for all of the different directorates so nsf has 12 different directorates that are specific to the area where you do research so if you just go down to generic click on nsf generic it's not gonna yeah okay so then you'll see it gives you the opportunity you put in your the title of your project you can put in your solicitation number the proposal deadline you can make it public so other people can see it or you can keep it private you can create a practice when you can share a data management plan with others so that's very useful because many many projects are collaborative so you could both go in and work on a data management plan separately it's kind of like google drive right so you create that i'm gonna say i'm gonna keep it private click on save or next and then it walks you through the five different components of a data management plan that the national science foundation requires so if you go in you can see so we we worked on this in our workshop last year and we found it some people found it a little more challenging to get in and use the system than we anticipated you can blow this text up into full so you can see it but this is what i gave you in a handout so if those of you have the handout Catherine do you have the handouts could you bring me what i did on the um let me see what i titled this handout thank you so much so data NSF data management plan dmp prompts all this is is i copied and pasted the text from these five categories in dmp tool on the paper so that you had it more easily accessible because it is a little cumbersome to get in and actually see them online but these prompts give you really specific details about the kind of information the NSF is looking for when you write your data management plan so this is very helpful um when you're providing guidance to people because a lot of people i think we're going to probably be marketing dmp tool more than we have in the past but if they've got questions this is helpful for you as well right it's helpful for you you can use this personally in your own research even if you don't ever share it with students or faculty so you type in your response right you would go to next you would work through all of the different prompts and then preview dmp basically it's just going to crunch it all together into a document with different paragraphs you might have to do some modification of this you might want to tweak it with formatting and that kind of thing but still it's very helpful to have a place where you can do this and what's nice is you go up to your dashboard and you can collect all of the data management plans that you have generated over the course of your career right which is very helpful because sometimes maybe you can reuse a data management plan with just minor modifications you don't necessarily have to recreate the will and you can i will point out before we go out of here that you can click on the public dmp's option if you want to look at a few dmp's and see see what they look like in different disciplines the one downside i think it would be incredibly beneficial for us as librarians and for faculty and graduate students if we had a database of awarded grants so that we could see data management plans that are solid and that were approved the problem is nobody can really make somebody share that information with you not even the office of the vice president for research can force somebody to share that information so these are plans that people have created but we have no way of knowing if they were awarded the grant unless they put that information in there okay all right so robert said that the proposal development office here on campus if you go and you want to see an example of a data management plan perhaps in your field they will go out and solicit data management plans of people so that you have an example during the workshop okay so that that's very good to know and the um i don't know if is it the proposal development office or the office of the vice president for research more generally that has the library with links to different resources on campus i didn't i'll send out may i'll i'll try to find some of this information and send out a follow-up to this session with that information so anyway that's because we don't have a whole lot of time to actually work within the system i did want to get you in there were you all able to successfully can everybody raise their hand online if you create not in here raise raise your hand online if you were able to successfully get into dmp everybody's good okay getting a lot oh i can see them look okay so that's really it for the online presentation what i want to do now is with these two handouts that i gave you what time is it how much time do we have and this is until 330 right okay so that's perfect all right so i'm going to give you a minute i don't know i didn't ask you to read this because typically in my experience nobody does it anyway um so it is relatively brief so i'm going to give you like five minutes to read this rat heart case study um i apologize for the content some of you might find it a little disturbing um but but it is quite brief and easy to read so i'll give you a few minutes to do that okay the recording is being paused is beginning again okay so what i want you to do um based on this case study and it's fine you to talk through this right there's no there are problems with this case study this case study was actually um is a part of the necdemic um material that you can find online so necdemic stands for the new england collaborative data management curriculum all right so you can find it just any cdmc online it is it was a part of an NIH grant funded project there is a tremendous amount of information um online through necdemic it's can be a little overwhelming and they do have some research cases like this but i greatly modified this case study i didn't think it was very much stream of consciousness it was actually obtained from an actual researcher and they didn't edit it at all and i just didn't feel like you could really do anything with it in a class setting so this is um greatly abbreviated for um ease of use so what i'll ask you so look at your data management plan prompts the first question is types of data produced so tell me what kinds of data do you notice that you would have to worry about managing in this particular scenario there are slides so people that's a very good observation because a lot of people assume data management it's all about digital data right but act management of actual physical materials is an issue too um it's fine if you've got all of your electronic data saved but if you can't go back and access those the slides that were important for your research then maybe it doesn't all matter so physical specimens certainly what else images images from multiple sources right so you've got two different cameras you've got two different kinds of microscopes that are taking images at any given time and it says so approximately 10 000 images per experiment were collected so think about what's entailed in managing that if you don't manage that properly it would be very challenging even for people involved in the experiment to go back and do anything with this data and while i included let me see i included in your data management plan prompts i swear this was part of the nsf dmp requirement before this number six roles and responsibilities that is actually not in the system now is a requirement i don't know if nsf dropped that or maybe it's just dropped from the generic template but i think that's actually very important so i went ahead and kept the prompts there so that it's something that you could think about if you ever need to um think through this kind of problem because i mean look at the lab group here i'm going a little off topic but if you look at lab group responsibilities you had multiple research staff could be analyzing the same heart one person measures the mechanical function of the heart one person stains the samples one person is responsible for the imaging i'm imagining maybe different people did different kinds of imaging because you had two different kinds of microscopes you had two different cameras the data set should all be linked in the excel spreadsheet but everybody is responsible for entering your own data so do you have proper protocols right are you using the same file naming conventions this is a big issue when we're talking about metadata and kathryn will talk about this more in i think it's the next session right the next session on metadata kathryn's really going to get into this in more detail but it's very important to make sure everybody's using the same conventions or else when you try to pull this together you know mistakes are going to be made you had 10 people involved in the data analysis there was one lab notebook for the entire group with observational notes with a paper surgical log the lab notebooks were kept in the pi's office and this is way off topic i wasn't planning on really going on this rant but i mean think of all the potential problems here in terms of data management right if you do not have proper system in place to make sure everybody's on the same page you are going to lose this data some of it's going to fall through the cracks right you're going to follow inconsistent procedures and it's just going to be very difficult for you to properly manage and make that data available should you want to share that data in the end and i'm assuming this is probably an NIH funded project right that's an assumption i'm not part but if it is you know NIH has a policy if you're funded but if your funding is over $500,000 you have to make your data publicly accessible right are you collecting your data in such a way that you can make it accessible in a meaningful way i would argue that perhaps not in this case so anyway yeah so that was data types let's go back to sorry i'm shuffling multiple pieces of paper here all right so what about metadata and metadata standards do they provide any information at all that you could like if you were filling out a data management plan what would you be able to say about their metadata practices katharine's looking this is where we're really going to find out because we've got katharine here i'm what yeah well the screen i i'm sorry i'm not really referring to anything in particular on the screen right now anyway so all i could glean and i'll ask katharine so they're using spreadsheets right so they are collecting metadata but we don't know what it is because the metadata would be every you know every field that they are collecting would be a metadata field but there is no specific information here on a metadata standard or the file naming conventions or any kind of protocol in that way it doesn't appear as if there's a data dictionary or readme file right right you would hope that they're documenting what the columns are but right you don't or who's doing what right are they documenting who's collecting the data what the equipment is what version of the equipment they're using and it looks like right computer code a lot of people don't think of computer code as data but that is important right so that is also a consideration and it looks like they're using paper notebooks that are kept in the pi's office but data sets are backed up on an external hard drive at some place i should have put some place in quotes i don't know where the mysterious someplace is so there are a lot of issues here right in terms of documentation maybe they're doing it and that simply wasn't captured when the librarian sat down with the faculty member and gathered this information but from what is in this very brief incredibly consolidated data management or research case study that information is not available do you have any other observations Catherine that might pertain to metadata right all of the information associated with instrumentation that people are using is very important right and again i've mentioned version version control is important with data management period no matter what you're doing but it's important as well for instrumentation because you know these things change and that could affect your results right if you were going to run an experiment on an instrument that is several versions or updates out of out of date so anyway that's metadata let's see are there other questions here under the metadata file formats again please tune into our next workshop this is a plug Catherine is fantastic and she is going to go over very specific metadata stuff policies for access and sharing i couldn't find anything on here so tell me what would be some i could just shut this off the computer is getting glitchy but i can't do that can i rob robber's like no don't shut it off okay so what would be some consideration looking at the prompts that i gave you your NSF data management plan prompts what would be some things that they would have to think about an embargo period so why might they need to use an embargo period so Valerie is head of our agricultural information center so right so this is an issue you know this is a sharing issue and maybe robber and adrian will get to this in their session but you know it's always a challenge getting people to share their data when you know they want to squeeze it dry right they want to get every possible publication out of that data that they can before making it accessible to other people because they're scared they're going to get scooped right maybe somebody is going to you know make the discovery or publish on their research before they do so an embargo period certainly you could go ahead and upload your data to a repository perhaps and put an embargo period on it that would give you time to um do what you need to do but still it would be there in such a way that it would be accessible once that embargo period was lifted do you have anything adrian to add no just about sharing and access using this case study i'm putting adrian on the spot sorry oh okay all right well that's a good point okay so adrian said if you journal if you publish in the public library of science plus right they have a whole big series of journals if you publish in a plus journal and this isn't just true for plus a different journals have different similar requirements um they require that you make your data accessible once the article becomes acceptable accessible right it needs to be simultaneous that it doesn't necessarily it varies really from art from journal to journal some journals will let you associate data as supplementary material um within that journal itself some simply want to include a link um they say you can put your data in a repository somewhere else and you can simply link to it um maybe they don't require that you maybe they just say that you have to share it there could be people with very general requirements so these are all considerations if a researcher knows that they want to publish in a certain journal and that journal specifies that you have to share your data that's good to know right before you start your project so that you can determine i mean maybe a journal or maybe a repository that you want to use requires that your data be in an accessible format right not proprietary but maybe your data was collected on instrumentation that was proprietary and maybe your data has to be converted so that it is accessible to people um it would be great to know that before you start data collection right because you can work that process into your workflow if you find out right when you're getting ready to upload your data to a repository that you have to go back and you have to do all of this conversion that could be time prohibitive it could be cost prohibitive there could be a lot of problems associated with that so this is a reason why um it's really important that you think about access and sharing before you even start and this is something you know this gets to this is falling into the area outside of what researchers typically think about in my experience at least when they begin a project they're not thinking about what they're going to do once they're done right okay so that was um let's see that was policies for access and sharing we've got policies for reuse and distribution really none of these were addressed and I think this gets to what I just said um when they were talking when the librarians at the University of Massachusetts and Hearst is where this came from um when they were sitting down with the researcher they simply asked the researcher to talk about their project um and so it basically ended with data collection and analysis right this is exactly what I was talking about let me see if I can actually get back to it this is what I was talking about here right as far as researchers concerned once you get past data analysis and you write your article and you publish it you're done right they're they're not thinking ahead to the the data sharing and reuse so let's let's look at those prompts really quickly so restrictions okay do you think there would be any kind of restrictions on the data associated with this re with this research so they're doing experimentation on animal they're doing animal research right so that does have a lot of requirements up front in terms of getting permission to do that but it's not human subject data um when you're dealing with human subject data often there are a lot of requirements you know you have to um make sure that it's not identifiable the information when you share it and things of that nature there are a lot of requirements in this case I'm not sure I mean they're doing you know animal research I'm not sure if there would be any restrictions in um that case but you do have to think so there I have worked with people um researchers in different fields who say sure you know I'll share my data but it's not in a repository you know it's sitting in a desk drawer on a hard drive or in a filing cabinet so you know if people if you make the hurdles so great that people can't really access your data are you really sharing your data you know I would argue no I mean it sort of gets to the whole metadata thing as well um if you've got to you've got to look at a journal article perhaps right it's not published somewhere you've got to figure out who the authors were maybe only one of the authors has the data right so you've got to contact somebody you're assuming they're going to respond because they can say that they're willing to share but if they don't respond to your email or to your phone calls then you have no way of getting that from them right and then you're basing the assumption that the shared data will actually make sense to you because they created a readme file or data documentation that would allow you to use the data so it's fine to say that you're sharing data but there are many parts to access that you have to consider beyond just you know theoretical agreement it's much easier it's always better to recommend that they find some kind of repository because then it's just out of you don't have to worry about it you know it's out there you can just if people do contact you you can just point them to the link to the repository and it's all handled all right so that was reuse and distribution archiving and preservation again I would just say but it's 317 you know the consultation that needs to happen in terms of archiving and preservation in my opinion is largely based on understanding what you need to preserve long time long term so I have worked with chemistry faculty who swore up and down that raw data might nobody would understand my raw data right only I can understand my raw data um it would be way too complicated and convoluted I mean he swore up and down there was absolutely no benefit in preserving or storing or making accessible the raw data associated with the project but I would argue that typically that is what you want to store right you don't necessarily have to store I mean maybe not always but you don't necessarily have to store the processed or analyzed data because maybe somebody wants to use the data in a way that you did right and they would have to do their own analysis anyway but in order to do that they would need the raw data associated with the project in order to do that kind of analysis but maybe there are some different parts of the data that were collected or processed or analyzed over the course of a project that are very important to preserve for whatever reason by working with somebody who is an expert in that area that's very beneficial this is one reason why we have multiple people in the library system who provide these kinds of services right I mean if I have a really complicated metadata question I mean I can answer metadata questions for sure but I would contact Catherine right because Catherine is our metadata librarian and she knows metadata better than anybody arguably here you know and it goes the same for you know data sharing and repositories that would be an Adrian thing so we do have a lot of resources here we've got Sarah Dorpinghouse now who's not with us here today but you know she has a great deal of experience in archiving and preservation and she would be able to perhaps consult with somebody on that kind of issue so that's the kind of thing that we as a committee are really trying to wrap our brains around right like what does comprehensive data management services at the University of Kentucky look like and I think that those are the conversations we're having that's the kind of information we're trying to gather with this campus survey that we are going to have access to those results shortly but anyway those are the kinds of things that you have to think about with archiving and preservation you know how long will it be preserved where will it be stored you know do you need additional information again these prompts provide a lot of helpful information when you are thinking through these processes and then roles and responsibilities we already kind of talked about again you know it's hard to do this I'm done it's hard to do an activity Valerie's given me a time sign um it's hard to do this kind of activity when you've only got an hour and a half to do a presentation and I wanted to get you into dmp tools so you could create an account and learn what it is and understand that it's resource you can use yourself and that you can share with your patrons and I also wanted to talk to the case study because I think it's different when you hear okay metadata what about metadata when you've actually got a case study you can think about what they did wrong and what they did right I do think that's beneficial so hopefully you did get something out of that I'm not sure are there any questions before I wrap this up does anybody Robert are there any questions online Beth has a question so I've actually had Kathy Gretch who is director of the PDO I believe she has come and spoken with us multiple times she has actually referred people to me and they have included my information so the the question for those of you who may not have heard is does the Proposal Development Office review data management plans they they will for sure and they they will collect data management plans to provide as examples but I have had them ask me if I would be willing to consult with people who need consultation and I know they have included my not in every single one of those grant bulletins but they have included my information and some of the grant bulletins that get pushed out to to researchers on campus saying if you need help with this please contact Chrissy Peters but this is one of the reasons I was saying it's so important for there to be an inner unit inner departmental campus wide group because no one no one unit can do it all you know everybody's under funded and under staffed and so hopefully you know with this committee in-house and with the campus wide committee that seems to be you know sort of gaining momentum hopefully we'll be able to work some of that out so it's I guess yes but it's complicated it's my answer to that is there anything else okay so Catherine you have an assessment how are we going to distribute that just in an email okay so we have an assessment for this workshop I'm assuming we'll have the same assessment probably for all of our workshops I will send out an email to everybody who participated both in person and virtually please be honest you know this is the first time we've done this whole live streaming thing let us know if there are ideas you have for additional future workshops please let us know that as well like link data I think would be great I I'm sure there are other ideas for workshops that would be helpful and we would appreciate your feedback but thank you for coming for participating we appreciate it yay I am ending this recording now goodbye thank you again