 Welcome to the Data Management video series from the University of Wisconsin Data Services. I'm Kristen Briney and I'm the Data Services Librarian here at the University of Wisconsin-Milwaukee Libraries. In this video I want to talk a little bit about spreadsheet best practices. And this is because spreadsheets are so ubiquitous in research. So many people use them and there are a few best practices that will make it easier for you to use your spreadsheets. And I want to warn you that some of the best practices I'm going to talk about in this video are a little different than the way you might already be using spreadsheets. And that's because spreadsheet best practices say that you should emphasize computer readability and computability of your spreadsheets over human readability. The reason for this is that people reuse data. People are sharing data on a scale never seen before. And so a spreadsheet which might have always been in Excel and can be used one way might now end up in R. And to make that transition we need spreadsheets that are computable. They don't have a lot of extra information because it gets lost in translation. So to really get the most out of our spreadsheets we really have to emphasize computability because that's the whole point of using a spreadsheet is we want to compute the data in some way and organizing it really helps. So to get your spreadsheets into a form that's most computable you want to do a couple things. The first is throw out extra formatting. So I'm talking about merged cells that's a common one in Excel, highlighted cells, extra font information like bold or italics. That stuff doesn't really matter all that much in terms of computability. And moreover if you're actually encoding information in your spreadsheet in the form of highlighted cells for example you're not putting that information in a way that you can actually do anything with it other than looking at it and knowing something about it. So if you're highlighting cells it's better actually to make another variable and say these particular data points have this particular property instead of just highlighting those cells or put that information in your documentation. So there's other ways you can present that information and make computable make it useful without just having it be formatting. So that's the first way to make your spreadsheets more computable. The second way to make your spreadsheets more computable is to really streamline them. So really what you want to do when you open up a spreadsheet you should really just see a big table full of data. Across the top you'll have one row that's all your variables and then every data point is one row under it. That's all the spreadsheet should really be. So if you have a lot of little tables in a spreadsheet see if you can't collapse those tables into a larger table and that's actually going to help you compute things because you can just say compute here, drag down, compute that whole column or compute that whole row. It's going to be a lot easier with a bigger table. The other thing about streamlining into one large table is getting rid of extra information that doesn't pertain to the table. And the first type of extra information is charts and graphs and figures and things like that. So take that extra information, take those images and put them in a separate tab. Often you see this in Excel and get them in a separate place on the spreadsheet so your spreadsheet is about your computation. The other type of information that you probably want to move away and out of your spreadsheet is documentation. So a few notes in a spreadsheet can really help. A lot of notes makes it really hard to deal with and doesn't necessarily help the spreadsheet be computable. So if you have a lot of documentation on a spreadsheet you can always use your research notebook to put that documentation in it. But I really recommend what's called a data dictionary. We did a whole video on data dictionaries because they are wonderful and they're really perfect for documenting spreadsheets. So I encourage you to watch that video, learn about data dictionaries and use them as a way to document your spreadsheets without really cluttering up your spreadsheet itself. So those are the two main recommendations for streamlining your spreadsheets, getting rid of formatting and emphasizing computability. I have a few other recommendations just in general for how to deal with spreadsheets, how to make it easier for you to work with spreadsheets. The first is choosing good null values. So what I mean is when you can't collect a particular data point or something happened and you had to throw it out, or other things that mean you couldn't measure anything, that's your null value. And oftentimes this is represented by the word null or a blank cell. There's just nothing there. So the point is that you want to have a good null value and use it consistently. A lot of different programs have different default nulls and if you stick with whatever your particular program is and whatever it recommends, that's a great suggestion. If you plan on using multiple programs, I recommend using just a blank cell. A lot of programs will interpret that as a null value. I do want to note that null is different than zero. So zero should always mean zero in your spreadsheets. It means I took a measurement, I did an observation and the answer came back zero. So zero and null are two different things and as long as you use them consistently, as long as you, I really recommend you write things down, say this is what I'm using for null, you shouldn't have any problems. The last recommendation for spreadsheet best practices is to keep a separate copy of your spreadsheet in raw data format from your analyzed data format. And this allows you, if you're in the middle of analysis and something goes wrong or if you want to change what you did for your analysis, you can just scrap all that, go back to your raw data, make a copy and start over. So keep those separate, keep them in different folders, different names and you just have a backup copy just in case. Another backup copy you want to keep on hand is probably a CSV file, a comma separated values file. And this is because a CSV file can be opened by so many different programs that if you lose access to your main program and whatever proprietary file type, if it only opens or it mainly opens, you have a backup copy of your data in CSV. And that is actually a good format for porting between multiple analysis programs. So it's a good copy of your data to have on hand. So those are the four recommendations I have for spreadsheet best practices. You want to eliminate extra formatting. You want to collapse everything down into one table and only one table. You want to use good null values and keep them different from zero. And you want to keep a backup copy of your raw data and a copy of your data in CSV just in case. And hopefully if you follow these spreadsheet best practices, you can get more out of your spreadsheets. They're easier to compute, they're easier to port between programs and you can get more from your data, which is the whole point of data management.