 When you're getting ready to work with your data, you'll find that your life becomes a lot easier if you have tidy data. Now, this kind of a funny term, but it actually refers to something specific within the data science world. I want you to know that if you go to Google and type in tidy data, two of the first three are going to take you directly to the journal article in the Journal of Statistical Software by Hadley Wickham that originally explained the idea of tidy data. And you can also take you to a page on the our project site that gives a lot more information about what is meant by tidy data. The idea is that data that is tidy is easier to manipulate and is easier to take from one software package to another. Now, this is where spreadsheets are a little problematic because spreadsheets are amazingly flexible and they allow you to do a lot of things that might suit your purposes when you're creating the spreadsheet, but they make exporting or sharing with others kind of a headache. So for instance, I've got a spreadsheet right here that does some of the things that I hope you don't have to deal with. First and foremost, we've got a problem here of merged cells. This is when you take a single cell and you spread it all the way across these other ones sort of as a header, there's a merge cell, there's a merge cell. This one is 15 cells merged together and it's smack in the middle of everything. The problem with these is they change the way the spreadsheet functions. So for instance, if I come here and click on column F, we see it's selecting this entire first cell and then selecting all of this, but if I then hit bold, it only changes this one. And if I try to hit bold back, it doesn't do anything I have to do undo. And if I were to try to move it, I'm just going to come right here and try to move it and oops, it just doesn't want to work because of the merge cells. Merge cells are things that people sometimes do to get title centered, but they make it really difficult to work in the data. So you don't want that. The other thing that you don't want is you don't want to have mixed up labels, you don't want to have things spread across in different tables, that might work for an individual project. But again, if you're planning on sharing it with other people, or if there's the possibility they might take the data out of Google Sheets and into some statistical application or programming language, they're going to need something else. Specifically, tidy data means that each column is the same thing as a variable, variables and columns are identical. And that rows and observations are the same thing. And so we need to get some of this cleaned up. And so another thing you can see, by the way, is with tidy data, is that all of the data needs to be in the spreadsheet. So this is an important piece of information potentially, but it needs to be in a column to mark pieces. Also, we've got a comment kind of hiding right here. And it tells us something that maybe this person could be coded as being West or an East, depending on how we want to do it. This is not the sort of thing you want to have in a comment, because if you export the data, you lose all of that information. And so let's take a few looks at how to tidy things up to make them easier to work with and less problematic. First one is get rid of all the merge cells. Get rid of those titles we had across the top. And then the information that said the first 10 were surveyed in person, make a new column and put that in there. So now we have that information in the spreadsheet. It's not included somewhere else. I decided not to make a change for the comment because I didn't think that was necessarily critical. If it were, then you might want to have two locations, starting location and ending location for each person. Now there's a couple of other problems. One is that we have gender is just coded in a crazy way. We don't even know what the zeros and ones are F and M we can guess. And so we need to do a little bit of data cleaning right now. I'm going to come here to this tab. And you see I've done two things. Number one is I took the variable and I changed his name from gender to female. Because when you have a variable that you're going to treat as dichotomous. It's really easiest if you use a zero for no a one for yes and you name the variable by what the one is. In this case, I decided to assign female the one it's arbitrary you can do whichever one you want. But for anybody who put down that they were female, I put a one for anybody who put down they were male, I put a zero. Similarly, for the method of surveying them, I decided that if they got surveyed online, I would give them a one and if it were anything else like in person, it would be a zero. And so I've cleaned that up. But there's one more step that we need to do and it has to do with this one right here. We've got information here about the location the person is in. This is a different level of measurement. Everything else is at the individual level. This is now about a larger group that they're in. And we're repeating a lot of information. If you've ever worked with a sequel database or a relational database, you know that you set up separate tables for different levels of measurement, you want to do the same thing with tidy data. So you keep information about the individual. But then if you have this information at a different level that's repeated, you put that into a separate table. It's just a tiny little table. And then we only have to indicate whether a person's location was northeast, south or west. And we can get the rest of this information from this table. And when you do those things, what you'll end up with ultimately is two separate sheets. You'll have one sheet that contains all the individual level information, where each column is a variable, each row is an observation and all of the relevant information is put into those variables and columns. And you'll have a second sheet that contains the variables for the other level of measurement. Now, again, this is mostly important if you're going to be exporting your data so it can be analyzed in another program. If you're working entirely within Google Sheets, it's never going to somewhere else, you might be okay doing whatever you want. Just be aware that the flexibility that you get with spreadsheets gives you a lot of room for creativity, but it can potentially create some headaches. And so when in doubt, take the tidy data approach, keep your data clean and make it easy to work with easy to analyze and easy to get meaning out of.