 Welcome to the Data Management video series from the University of Wisconsin-Milwaukee Libraries. I'm Kristen Briney. I'm the Data Services Librarian here at UWM. And in this video, I want to tell you a little bit about a documentation structure you might not have heard about before, but is really useful if you have spreadsheet data or if you have data that contains a lot of variables. And that type of documentation is called a dated dictionary. And a dated dictionary basically provides the context for interpreting all the variables in your data set. So why would you want to use a dated dictionary? Well, primarily a dated dictionary can help you, the researcher, with understanding and remembering all the details about your data. If you have an important spreadsheet that you plan on reusing in a week, or a month, or a year, or even 10 years, it's worth creating a dated dictionary for that data set because you're going to forget details over time. And if you write those down in a dated dictionary, it's going to really help you remember everything that you need to know about that data set. The other time you want to use a dated dictionary is when you're sharing a data set. And this can be sharing with a coworker. It can be sharing with a colleague or a collaborator. This can be sharing with your PI if you're a graduate student. If you're going to give a data set to somebody else, having a dated dictionary really helps that person understand your data better. And it means they won't be pestering you with questions about the data because all that extra information they need to know is in the dated dictionary. The final instance that you really should be using a dated dictionary for is when you're publicly sharing a data set. A lot of researchers are now under federal funding requirements to share data that underlie publications. And if you're sharing spreadsheet data, if you're sharing data with a lot of variables, really do consider using a dated dictionary because that's what's going to help somebody else who's totally unfamiliar with your data pick up that data and understand it and reproduce your results or use it for their own research. So there's a lot of reasons to use a dated dictionary, but what exactly is a dated dictionary? So a dated dictionary goes through variable by variable or column by column for your spreadsheet and says, here's what the variable name is. Here's what the variable means and kind of how we collected this data. Here's the format of the variable. Here's the precision. Here are the units. For example, this data is in grams instead of ounces. That's a really important distinction. It's something you want to put in your dated dictionary. Other things to include are the value of null values or what it means if you have a blank value. How particular variables are related to other variables in the data set that might not be initially clear. Basically any weird thing that happened with your data. For example, last week we had something funny happen and all our values are off by 10 points. You want to put that in the dated dictionary because that's going to help somebody understand these little quirks about your data set. So the dated dictionary contains all of this information about each variable in your data set. It's not necessarily documentation about the data themselves, but basically documentation to give you the context of understanding that data. And it's really important to put this information in a separate document from your data. Because, for example, if you have a spreadsheet, the whole point of having a spreadsheet is to make your data computable, make it really easy to calculate values. And if you start putting in this random information, it kind of clutters the spreadsheet, makes it really hard to really have a streamlined analysis of your data. So have a separate documentation, have the documentation, include all the necessary information for interpreting a data set. So that's a little bit about what a dated dictionary is, why you want to use one. But let's actually look at an example because I think it helps, may help you understand what a dated dictionary is. So the example I'm showing you is from the Duke Lemur Center, which has a lot of information over time on lemurs, of course. And so the information that I'm showing you is weight information of lemurs over time that the Duke Lemur Center made publicly available in 2014. It's really a nice data set. If you look at it, it's really nice and organized. It's really clean. It's really consistent. And the variable names are good enough so that you kind of understand what they are, kind of, but not always perfectly. For example, we have a variable in the spreadsheet called taxon. It has a bunch of codes that just from looking at them, it's not clear what those codes correspond to. Probably taxon, but what particular taxon? So this is why we need a dated dictionary. So looking at the dated dictionary, they made it available as a readme.doc file. You can see that it has two major tables in it. The first goes over those taxon codes and says, here's the code, here's the lab name of the species, and here's the common name of the species. All that information that you need to understand the taxon. The second table is really the heart of the dated dictionary because it goes through variable by variable and gives you extra information about those data. So here we can see we have a really nice dated dictionary. It gives a comprehensive coverage of the variables in the data set. It gives you all the coding values you need and really helps somebody who's totally unfamiliar with this data pick it up, understand it, and be able to use it right away. So I hope this example helps you understand what a dated dictionary is and why it's useful. I really encourage you if you have spreadsheets, if you have data that has a lot of variables in it, if you're sharing that type of data with other people or if you're just planning on reusing it a year, do create a dated dictionary. It takes about 20 minutes to write all that stuff down but can really make the difference between having a data set that you would love to reuse but it's hard because you don't remember all the details and having a really usable data set. And that's the whole point of data management, making your data work better for you.