 Um so let's make it tidy. A dedicated tool to get data into a tidy format is um some there's three main functions that I want you to know about getting data into a tidy format and that is gather which is sort of most closely uh related to putting each observation in its own row spread which puts each variable in its own column and separate which puts each value in its own cell. So gather, spread, and separate these are all listed in your cheat sheets for easy reference. Let's just take a look um so I've created a tribly silly um little table here that I am saying is wide and I want to make it long and I do that with gather so how do you write a gather um function. Well I will first point out that you want to whenever you're manipulating data it's good form to put the manipulated version in a separate data frame or tibble from the original that way you can always go back and do things differently and you don't have to um try and undo what you've done you can just start from fresh each time so you have your new tibble here you assign it this name with this function it comes from your original tibble pipe and gather what you want is universe here which I've marked you put universe this is the name of the column that will hold the gathered column headings then name is the name of the column that will hold the gathered column data and then in green I've got marvel colon dark horse comics this is the first and final rows in a range sorry first and final columns in a range of columns that you will gather so the output is a long data set where universe holds what was previously column headings name holds what was previously column data and all of that comes from this range marvel colon dark horse comics any questions on gather if it's not just me it'd be wonderful if you could just go through those two slides again but yeah yeah that's fine got it don't worry yeah there is a cheat sheet there's plenty of examples to work through it can seem like it's hitting you really fast but yes I can go through them again so um save the save your output as a new variable that's just good practice you save things with this we saw that in the last your original data that you're working with pipe and gather so I think we're probably all together with that universe this is what will hold that is the name of a new column that will hold the existing column headings then you have the name of a new column that will hold existing column contents then you have the range of columns that you want to gather now there you can also do this with a sort of a list so you'd have c parentheses marvel dc comics dark horse comics and parentheses but because I'm I'm taking an entire range I've showed you the range function which uses the colon the first and the final of columns of the range and that's a lot of instances of the word column so here universe holds the headings that you have gathered name holds the contents that you have gathered and marvel colon dark horse comics is the original set of columns that have been gathered into headings and contents one more time or are we good yeah it can be a bit tricky so let's actually go back two slides and I'll show you so universe holds the headings name holds the contents and the range that I specified was this column through to this column so there we go any other questions on gathered you will get a chance to practice this gather is is the one that I like the best I think it's the most useful I use separate as well but the the data sets that I've prepared for you didn't actually need spread as much although you may find a need for them so between spread and gather of interest I'll show you spread does the exact opposite of gather because depending on the shape your data starts out in I do want to be clear there's not a right or a wrong way to have data in general the tidy verse likes long data better but sometimes especially if you want to group things or join tables or whatever you will need to turn long data into wide so that you can join it with another table that is already wide or something like that you do need to know how to do both so gather turns wide data into long spread turns long data into wide and you'll see these are the same two tables as before but they are now going from one to the other and pretty much it's just the reverse of gather in that you set up the same sort of start you know with the naming your your new variable the naming function the name of the data set you're working with the pipe spread and then the parentheses to enclose the contents of the function and you put the same columns here so universe and name and what it does is the first one it says right turn these the contents of this column into column headings and this one it says turn the contents of these this column into the contents of the newly created column headings so it just does the exact opposite of what gather did okay got a few more minutes before we get to the the short break so I'll just say separate is a little bit more straightforward I think I think everyone knows like the date column that we saw earlier that sometimes things are jammed together in one column and they really ought to be split out so separate splits that out so if you have multi value cells you can create a table that has no multi value cells and this can be done in a couple of ways if you know that there's a consistent format you know a particular marker in this case a comma there's always you know the the gender of the superhero a comma and then a one or zero to indicate whether or not they have a film you can say right that that structure that format I can use so I will put the name of the column that I want to split the name of the columns that I want to create and the thing that I want it to split at if that is a character it will split at that character that character will not be included in either column you can also depending on the structure of your column split at a position in this case if it had been whether it has a film comma and then the gender so this is just you know the the different order that you could have had this jam together values in again you have the column you want to split followed by the names for the columns you want to create and you can indicate the position so you will separate them at position two that is the second position along regardless of what is in that position it could be a blank it could be a hyphen it could be a mix so that some have hyphens and some have blanks and some have you know dashes or or commas it doesn't matter second position it will split the contents of that will not be included in either column so the when you split it this is what you get you get the two different columns it's fairly straightforward I think um that should say separate not spread oh with the typos um I'll fix these um so that at the end by tomorrow all of these typos will be corrected so that you can use the uh slide deck if you want in a more useful way okay so that's what separate looks like and we are now on to the next um sort of set of exercises uh tidy r which basically is asking you to gather spread and separate some of the files that we have loaded in the last set