 Sort of a few people trickling in We'll get going here in a minute or two. Thank you all for This is your first time for coming and this is your second time. Thanks for coming back I'm in a new room and my wife kicked me out of my bedroom. You can see that I'm in a room that looks like it's been in a war zone I Had a mother workshop earlier in the week and I set I set the date of that workshop Before I knew all this COVID stuff was going on Because I figured by that date by March 30th. I'd be done with Cleaning up and renovating and painting this room and I missed the deadline by a couple days, but it's a little bit nicer than sitting in my chair with kind of a crappy lighting so So anyway, looks a little bit better and hopefully by the next time we do this there'll be some paints to cover up the Spackle and the graffiti that my kids left on the wall So so we'll go ahead and get going Thanks again for joining us This is the second week that we've been trying to do these this code club online For those of you that are new, this is something that my lab has been doing over the last two years We do it in person as part of our lab meeting about every week or about every other week Someone will pick a topic present it break us up into groups of two Two or three people will work through a problem and then come back And so that's what I've been trying to do at a much larger scale. My lab has about eight people in it Sometimes we get some guests to come but but to do it with you know 15 to 30 people Via zoom seems like a fun thing to try Especially during this time where everyone schedules and a bit of flux and people kind of maybe want something concrete Each week that they can look forward to And another you know, if you're not able to work at the bench, of course And you have the bandwidth kind of mentally and emotionally to Start to learn some other skills and and and perhaps get a little bit stronger with your programming skills And so as I mentioned last time The the number one rule in this when I break people up into Paris is do not be a jerk I really really have no tolerance for people being jerks to each other The feedback I got last week that people really enjoyed meeting other people that were perhaps from the other side of the world From labs that they knew of but had never met this member of that lab So it's it's meant to we're supposed to learn something I think more importantly, it's also an opportunity for us to get to know other people and to And have some fun while we're doing it. So being a jerk is not not part of that So in in the feedback I got in the questionnaire that I gave at the end of the lesson One of the things people were interested in was more about how do we get data into a format so that we can work with it And so what I'm gonna talk about today my plan is to talk for maybe 10 or 15 minutes about two functions that are part of d plier and d plier is a package That's part of the kind of omnibus tidyverse package And and those two two tools are called a rename and recode and The data that I found that I thought was kind of quirky kind of fun comes from 538. They did a poll and They asked two questions. They wrote a paper. They wrote a story about one of them one was on the use of the Oxford comma and so This is the idea that if you've got a list of say three or more things in a sentence that you separate by commas the Oxford comma is so if you had say the The I'm trying to think of an example and so the example that I So if we said Bob Sally and Joe Right, we would say Bob comma Sally comma and Joe That that comma before the end is called an Oxford comma and that so some cases people will leave out that comma, right? and so there are various memes online of What? You know, what happens if you leave out that Oxford comma? Leading to some somewhat humorous predicaments for some of the characters in the sentence The other question I asked was whether or not the word data is plural So in the Latin data is plural whether or not we actually use data as plural is another issue, right? So we might say the data the data show versus the data shows or The data are Indicating this versus the data indicates this Or the data is data are data is which do you prefer right and so The the grammatically correct is to treat data as plural But we I think even I in my pedantic points need to accept that data evolves or that grammar evolves as we use it And so anyway, they collected about 1200 1100 observations where they did a survey and asked people Oxford comma usage where the data is plural and they asked them some other data Anyway, so that's kind of the background on this data set They've made it accessible to the public through a link on github And so what I'd like to do like I said is take 10 minutes here and do kind of a brief Overview of how we might use these two commands and why we would use them the recode the rename and recode function Okay, and so what I'm going to do is I'm going to go over here to our studio and Hopefully you were able to follow the setup instructions to get our studio setup I'm going to come up here to the upper left corner where you see the green circle with the white plus in it to open up a new R script and I'm going to take these three lines of code from my prompt so library tidyverse github left arrow read CSV and this link and then the word github. Okay, so I'm gonna Copy and paste One of the things that hopefully you're noticing is that you can read data into our from a website Which is pretty slick right so this data set is available as a CSV or comma separated value File on github, so I don't have to download the file. I'm gonna read it directly from the internet and so to run I'm gonna go ahead and save this script to my To my desktop and I'll save this as 2020 hyphen 04 hyphen 02 hyphen code club R Hope you all had a good April Fool's Day yesterday. My kids enjoyed it immensely at my wife and my expense anyway, so Whether or not you save it isn't isn't that big of a deal, but it helps to kind of Propagate this as we're going forward So to run these three lines, I can highlight the whole thing I can Highlight all three lines and then click source Alternatively, I can put my my cursor on the line and then hit the run button So I'll go ahead and do that. So the first command we run is library tidyverse Which loads the functions from the tidyverse package? The cursor jumps down to my line three now to read the CSV file so I'll run that and The output from read CSV Gives us a bunch of information About how so it says parsed with column specification. So it tells us how the columns are read in so respond an ID Was read in as a double In your opinion which sentence is more grammatically correct question mark That column who has read in as a character and so forth so If I run so this github is the variable that the data frame this data table has been assigned to So if I run github on its own And make this a little bit bigger What it's telling me is that it's a tibble that's 1100 rows and 13 columns You'll see that the column names are really long right so in your opinion something Prior to reading something right so these are really long column names And they're really nice for us to read right? So if we're looking at that CSV file if we wanted to read the column names, they'd be really easy to read We'd know exactly What the question was that they asked in the survey but for working with computers working for our these names just are really horrible things like spaces punctuation Capitalization These are all things that just make it really hard to work with in our But and I think this this is somewhat artificial right that we're taking this from 538 But in reality, it's it's kind of real right so I work with collaborators clinicians They give me data with information about the patients or I might work with soil scientists And they might send me data off of their machine and the column names or the variable names that they're giving me Might be formatted kind of like this right and so I don't want to go in and change the raw data They give they've given me I want to use an R script to modify the files So that I can reproducibly go from the raw data. They gave me to the data. I'm working with and so the first command we're going to learn is Rename and so rename will allow us to rename our columns to have a certain name The recode will allow us to take something like the values in this sentence It's important and whatever follows and to simplify that or to change it to some other value, right? So we might take yes and no and we might prefer to change that to true or false so that's where we're going and We'll do some exercises in our in our small groups to to test that out Alright, so if we want to see the names of our columns, we can use the call names function And again, we can then hit run or you can do command return To run that line and it'll then run it in the console down here I'm gonna make my window bigger because I don't need that stuff over on the right and These are the column names that were at the top of our github data frame, right? So these are really long Column names and they've got punctuation. They've got various capitalization. It's kind of a mess It's not something I really would want to work with so what we're going to do is we're going to use rename to change those So the syntax for rename I'm showing here If hopefully you can see my safari screen that the syntax is to say rename the name of the data frame that we want to rename and then a new name Equals an old name We can then add a comma and then add other column names. We could do a new name to old name, too And I notice in here. I've got a small syntax error, but if your column name has a space in it You need to have quote marks around that name because again our hate spaces I have a typo here that this new name to should also have quotes around it because it has a space in it Hopefully I can convince you though that we do not want spaces in our new names. We want to keep them In a nice nice format and if you want a space That I really encourage you to use an underscore like we did here for new name one or new name two Something else that's built into the plier is this piping function. So we can say data frame pipe that into rename and I have another typo here. I need to fix So that we don't we don't need this data frame in here. Sorry about that So we can see this with real code So using the the github data frame I can say rename github the data frame and so take the Respondent ID column. That's the old column name and rename it respondent. I Can also take the column in your opinion which sentence is more grammatically correct and rename that as Oxford or not Okay, we can also do it using this this pipe type of notation And so what we can do to show you this is if we do github I like the piping because it more directly shows at least in my sense of how things are getting moved through the data And so we can do github rename and so that we're going to say the new column. So I'll say respondent equals quote respondent ID I don't need the quotes there because there's no space but for consistency. I like leaving him in there It's not critical I can also look at Oxford or not Equals and then in quotes This sentence and I want to include the whole thing In your opinion which sentence is more grammatically correct So we're going to take that long column name and convert it to Oxford or not. I can highlight those two lines Hit run and then when I look at my column names those first two column names have been changed to respondent and Oxford or not Okay, so this is the rename function and I'm gonna have an exercise For you to work with a partner on to think about how you would rename some of the other columns So another variable that we might think about Another thing we might want to do is change the value of our columns to make it a little bit simpler. So One of the handy functions to use with the plier is select So if I do select respondent and Then Oxford or not Select will return those two columns for me So let's see what this sentence actually is in the Oxford or not column and so what you'll see is That the sentence is it's important for a person to be honest kind and loyal Either without the Oxford comma or with the Oxford comma so what I'd like to do is to Take the sentence that's in the with the Oxford comma and to turn it into a set into a value that says Oxford if it doesn't have the Oxford comma, I'll say not Oxford right and so I'll go ahead and Leave that there for now and I will use this recode function and Looking over at my web page The recode function Takes Kind of merges two functions. So the first is mutate And mutate is a special function that we'll have to talk about at another code club, but mutate allows us to change a column so we're going to change a column the column we're going to change is Is Oxford or not I'm seeing I've got some Bugs here in my code. I'm sorry. All right. Sorry. This is my right This is my demo of the general generic syntax So we're going to change a column the column I want to change is the column that I want to recode so the the code I'm going to read the column. I'm going to recode equals Running the recode function on the column. I want to recode right so we're basically creating a new Column, but we're writing that new column back over the old column Hopefully that makes sense when we look at a real example We then take the old value and set it equal to the new value One of the things I hate about recode and rename is that with recode you say old equals new with rename It's new equals old okay so What the syntax here then is is that we will say mutate Oxford or not equals recode and We're going to recode the ox or not column and then take the it's important for for the version without the Oxford comma and Say that's non oxford and the one with the oxford comma and say that's oxford, okay So What we'll do is again, I need to do mutate and I'll say oxford or not Equals recode and I'm going to recode The oxford or not column and I'm going to take the old value and assign it to a new value So I'm going to copy this first example Including the period I'm going to put it in quotes and Say that equals non oxford and on underscore oxford and Then I'm going to say Copy the second sentence with the oxford comma and Say that equals oxford and so then we have two closing parentheses One to close for the recode function and one to close for the mutate And so now if I run this I See that my two columns Have the respondent ID and those sentences have been changed to non oxford or oxford Okay So again, this is rename and recode Something we can do if we want to know how many people did the oxford versus non oxford is We could say add to this the function count oxford or not and This will then count The number of people that used or preferred the oxford comma versus those that didn't Okay So again, we included this select line here to get rid of all the other columns If you want to include all those other columns, we would remove this select one So if I go ahead and remove that and then run these lines It does the same thing, but if I get rid of this count Then we see the full data frame right we see the fall 13 columns, but our first two columns have been modified right So I know this is going fast The information is in here in these two sections on rename and recode and What I'm gonna have you do with your breakout group is spend Say 10 15 minutes Using the rename function to rename more of the columns in the data frame right so I renamed Two of the columns the first two columns Go ahead and see if you can rename some of the other columns come up with good variable names for them Maybe talk with your partner about what you think makes for a good column name then I'd like you to use the recode function to recode the values and more of the in more of the columns and And then finally as a stretch for For homework if we don't get to it, but you could think about this is my code for Effectively doing what we just did getting but getting the percentage of people that use the Oxford comma or not and what I'd encourage you to do Is to think about What fraction of people would use data as singular versus as a plural work? Okay? So again, we'll take Maybe 25 30 minutes here. Have you all break up into teams? I'll split you all up into different groups Before I do that. Are there any questions that anyone has before I? Split you up into pairs. I Had a good question. Sure. So if I run the call names get help I don't see the change in the name of the columns. Is that right? Right, so you won't see that in the name of the columns because Because we haven't written things back to get up We're taking that data frame and we're modifying the columns, but we're not saving it back to get help if I would have done say New get hub Left arrow and then assign that to the pipe then when I run this New get hub Has the modified columns, but because I never saved that back to get hub or assigned it back to get hub Then we don't see it All right, great. Thank you so at least stop sharing my screen and I'm gonna assign you all to Paris and And so go ahead introduce yourself to your partner when we're about halfway through I'll send you a reminder to maybe transition from renaming columns to trying to recode values and Don't feel bad if you don't get to that third question That's really meant as a stretch if you can get there or for homework for something to follow up on later in the week to Reinforce some of what we've talked about today. Okay So I'll go ahead and split you up and then I'll give you a few minute warning when we're close to the end And we'll come back and report back our results. Okay hopefully You found that interesting or Good process of working through how to rename columns and a how to recode variables one of the groups I popped in with talked about how It's a challenge to take like these really long names that are actually descriptive right because they're using like 10 words and how do you condense that down to something that is descriptive and informative and captures kind of the essence of Of what the column name was So let So we've got two exercises and like lack of I guess three if people got to it Um, does anyone want to share with us how they went about doing the first exercise of using the rename function? Anyone like to share their screen and show what their group did? So maybe I'll go ahead and show mine There's no the the only right answer is getting An answer right so there's many ways to do things in our and so don't be Um ashamed if yours isn't beautiful Um, uh, that's fine if it works. It's beautiful Um, and so so I'll show you the solution that I've I think I've posted Is that uh, hopefully you can see this here That uh I took the various old names that were provided So what we did was in your opinion, which sentence is more grammatically correct And so I changed that of course to oxford or not And then the next question was prior to reading about it above had you heard of the serial or oxford comma? So serial comma is also known as the oxford And so then I renamed that heard of oxford, right? So, um, I don't know like I think this last one important or this one importance of grammar Was maybe a little bit longer than I wanted it to be just because it's I'm lazy, right? I don't want to type more than I have to But you can hopefully see that So the the big the big thing is finding good column names to replace these very long sentences with and And so thinking about again, how do you make that descriptive? How do you make it concise? I try to keep everything lower case because I don't want to have to worry if something was uppercase or up lower case or you know like household income here was Both words were uppercase whereas in some of these others Um, it was like, you know a sentence case So anyway, um This was the recodes or the This is the rename Sorry for the typo The renamed step that I wrote Where we took respondent And used that in place of respondent id And how I then plugged in my new column names in replacing the old column names And so if you run this so I'll go ahead and copy this over here And I think if I'm in a code chunk that's connected by these pipes If I'm anywhere in it I think I can leave my cursor there hit run and it will run the whole code chunk And so I ended this with call names And so then you can see the column names as they came out or if I get rid of that And run the code chunk again You can see That the data frame now has those column headings All right, so hopefully you're able to get one or two other column names Changed I had all day to work on this so I had plenty of time to do it I I only gave you maybe 10 or 15 minutes to work on this one So don't feel bad that you didn't get them all done um The next exercise is would anyone Does anyone worked up any courage that they want to Try to take on describing to us what they did for um renaming or I'm sorry recoding A column Sure, I can do that. I'm proud of sure Stop sharing my screen and then you can share yours if that's okay So go ahead So we looked at the column How would you write the following sentence so that column had the header as This particular question here Then we use rename to change that header into ease or are Because the options that people either responded were either some experts say it's important to drink milk But the data are inconclusive Or they say but the data is inconclusive. So we changed the the sentence that had the word are as are And word If the sentence had the word ease we changed it to ease. So we used the same syntax. We used mutate Created a new column ease or are then recoded that To either are or is based on the sentence it had Yeah great Very good And so it looks like you are trying some other things down below with calling columns. So good. So that's Exactly what I did. Um, let me if you want to go ahead and stop sharing your screen I can come back and share mine And so if if you look at my exercise too, um, I took the same column Instead of I think you called it is are I called it singular or plural And I recoded singular or plural Again for the one that is inclusive inconclusive. I said singular Are inconclusive I said plural Perhaps what you could have done would be to say Um Rename this column to be plural and then you could say Is inconclusive equals false Are inconclusive equals true, right? So there are many ways to do this Just because this is how I did it doesn't that doesn't make it right, right? This is it works for me and perhaps works for what I want to do With subsequent analysis um, and so then If I go ahead and copy this Into my r script and run that I get a table That says 228 people use data um as Uh plural and 865 as singular And 36 people, uh, I think were perhaps confused by the question or perhaps they didn't get that far Excuse me on the survey and um We can tack on to it this final line To get the percentage, but I think it's clear that um, so about 76 percent of people Said in this sentence data was singular And about 20 percent said it was plural and about three percent didn't know Okay, and there are other things that we could do and we can talk about perhaps in future sessions If you've got data, that's an na. How can we get rid of the na? So we're only looking at Values from people that had an opinion Okay So hopefully this made sense and thinking about how we can work with data Again, one of the big motivations here is that if we get raw data from a website Or from a collaborator We don't want to change the raw data. We want to leave it raw and that we want to use a script to Um, modify that data frame so that we can work with it going forward That'll help make our analyses more reproducible And if people have questions about how we coded things or how we worked with it We can show them the script and kind of the provenance of all our data. Okay so, um Any questions that people had about using rename or recode You can raise your hand if you want in the under the participants or under your name, I think If you have a question Or you can unmute yourself and just ask people seem kind of bashful Hi pat. Yeah, so I have a question. I'm not exactly sure how to frame it, but if downstream I I'm pretty sure that I'm going to be like partitioning this data if it's like a binary answer Um, are those times when you might have utility in using a more generic true false or yes, no and having the header be more indicative of the content Yeah, so um So the question about like partitioning the data There's a function called group by And so you could take your data frame and group it by Singular or plural, right? So the singular or plural column We could group by singular or plural And then do some analysis. So I think there was something in here about age, right? So we could say what's the average age of people that want to use singular or plural so Um, maybe what I could do Uh, just here in the closing minutes is that I would add to this group by Singular or plural And then so I'm going to take my data from and split it into people that think it's singular and those that think it's plural And I could then do summarize and I could say, um Mean age equals mean age And then n equals n And so this then um So I have to add something here I've got some bug in here somewhere. Um, I don't know what I'm doing wrong. Um Anyway, this this would even though it's not working right now, you can kind of see it for the n That it's taking the data frame. It's partitioning it by Singular or plural or n a and then within each of those groups. It's then Um, it's then reporting back out Hopefully theoretically the average age Um, or the number of individuals. So that's one way you could do it If you wanted to get out say you wanted to exclude people that thought it was singular Um, you could then use filter and filter might work easier with A true false, but it's also not that big of a deal to say Singular plural equals singular And work with that group And so those filter steps And this group by might be something that we work with in the next couple of weeks As we kind of think about these deep plier pipelines of analyzing data Not sure what I'm doing wrong in this set of commands. It should work, but anyway All right, so this brings us to the top of the hour and I want to thank you all for participating in your questions. We'll be back again next thursday at three o'clock I'll try to post a teaser earlier in the week with the link and then Usually around one o'clock on thursday I'll post the prompt and kind of the activity for the day So, uh, please tell your friends and colleagues to join us and um, have a good week and we'll talk to you soon. Stay healthy