 So welcome back to another week of Code Club. Again, my name is Pat Foss. I'm a professor at the University of Michigan and through these Code Club activities, I'm trying to lead everybody in an incremental way through learning our programming and trying to do it in a social format. And so I'll talk for a bit about two new functions in the package called Deployer. We're going to talk about COUNT and filter and then I'll talk about it and then I'll turn you loose to work on some exercises. And at the end, we'll come back and we'll share what you all came up with and then hopefully people will be too bashful to speak up with what they what they found. So last week we talked about this data set that was generated by 538 where they asked respondents about 1200 people whether or not they use the Oxford comma or whether or not they treat the word data as a singular noun or as a plural noun. So do you say the data are interesting or the data is interesting, right? And so technically data is a plural noun, but language evolves and so usage and how we think about words changes as well. And so it's kind of an interesting data set. The data is primarily categorical data of kind of how people answer different questions about the word, these two different grammatical points as well as things like their socioeconomic and educational background. And so as I said this week we're going to talk about the COUNT function and the filter function. Both of these are in the Deployer package, which is part of the overall tidyverse. So if you weren't able to join us last week we, as I said, we talked about the rename and recode functions that we use to modify a data set from this study that 538 conducted and then we we use those functions to get our data frame to be easier to work with. So I'm going to go ahead and copy this code chunk that's up here in the prompt. And I'm going to come over to my RStudio window and just to make it easier to see what's going on I'm going to slide that over and I'm going to make my font a bit bigger. I'm then going to open up a new R script and I'm going to paste my code from last time into this window. And if I go ahead and highlight the whole thing with say command A I can then click run and it then runs through and in those those commands. And so what we'll see again is we load the tidyverse library that gets us that Deployer package and ggplot and things like that. For today, we'll just be using stuff from Deployer. We then read in the data set from github. We do a bunch of renaming and then we recode the Oxford or not call variable we made as well as the singular or plural column that we made and then it gives us output that tells us basically that everything worked fine. It's in red, which is kind of scary, but we look through this and everything looks great. So like I said, I'm mainly going to be working down in the console and I will go ahead and clear my screen, which I can do in RStudio with control L to put things up and actually you know what I'll probably just put my face up here in the upper right corner. And so again, we can look at the github variable. It's a data frame that has 13 columns. As we can see from up here 1129 different rows. We've got respondents 1100 respondents that answered these 13 questions for us or not us for our 538. We just get to clear the data, which is fantastic. So the first thing I want to talk about is the count function and we saw this last week and maybe the week before one of the things I like to do is to kind of throw something out you to just get you exposed to it. And then as we go forward, kind of dig a little bit deeper into its use. Education scholars say that people actually learn better that way to kind of have things slowly revealed and to see things in different contexts. And so we've seen filter in different contexts. We've seen count in different contexts and today we're going to talk about it. And then the next time I probably won't spend much time talking about it but you'll see it in a different context and because your brain does maybe hurt but stretch to better understand how to use it. So we're going to start with count. And so the count function takes our github data frame and I'm going to pipe that to count and the count function will count the number of different times each value is used for that variable. So let me give you an example. If I put an oxford or not I'm going to count the different values and the number of times they're used in the oxford or not called. And so what we get is a new data frame that says ox non oxford oxford and the number of times they'll show up. So this looks like a summary table, but it's actually a new data frame that's being generated by that count function. So we see that people primarily use the oxford comma over the non oxford comma if we did something like github count singular or plural we would see that most people by about three to four fold prefer to use the singular understanding of data than the plural. So people would prefer to say the data is interesting. And so those of us dinosaurs that like to think of data as being plural perhaps we need to get with the times and realize that the language is shifting. Okay, so another thing we can do with count is that we can give it two variables separated by a comma for it to kind of do a contingency type table or a covariate table so to speak. So perhaps we could do github count singular or plural and then we can add gender. And so then we'll see that we have three gender categories female male and na So this is probably more like sex than gender, whatever. And then that we have a singular plural and na where someone didn't respond, right? And so we see that that actually males were more likely to use the plural form of of data than females, which I don't know what that means, whereas females were more likely to use the singular form than males. Huh, interesting. So again, we can use count to get a sense of counts in a single category, a single variable, but then also looking across multiple categories. The other thing that I like to use count for is to better understand you know, what are the values in that column? Because say I did education, there's an education column, right? So if I did github and then pipe that to count education I now see oh, I've got five different categories plus a no response category And so I could then begin to think about well I want to work on the the data from people with a bachelor degree or I want to work with people with a graduate degree. Okay So that gets us to the next question. How can I take my data frame and work with subsets of the population that responded to our survey to look at how they responded to these different types of questions and that's where the filter function comes in And so I can now do github And pipe that to filter the filter function and I'm going to give it An argument so I will then say oxford or not equals equals oxford And this then outputs a new data frame It's a subset of the github data frame that only has 641 rows now, but the same 13 columns But in this case Everybody responded oxford picked the oxford comma sentence And that's again because I'm filtering on those rows Where the oxford or not variable was oxford Okay, so a couple things to note here a lot's happening within the argument here for filter So oxford or not is the name of the column we're interested in Oxford is the value we want out of that very that that column And then we've got this double equal sign So this double equal sign is a special logical function That asserts the statement the left side Is equal to the right side Okay, so oxford or not equals oxford So sometimes that's true. Sometimes it's false if it's true It's going to return the value true If it's false, it's going to return the value false And so if a row is true filter will say we're going to keep this If the if the row value is false, it's going to say no, we're not going to include that in the new data frame that we create okay So I can add a I can up arrow here and I can then do count oxford not To count The number of times oxford or not oxford is used In my new data frame that's been filtered and we see that Our data frame sure enough only has responses where people used the oxford comma in the sentence that they chose Excellent it works All right So a couple things to note if we go back to this This filter function, there's a few places where I And probably you uh fall into error So one problem Is that this double equal is kind of a unique thing as I mentioned. It's a logical function Most of us are typically used to thinking of a single equal sign So if I do a single equal sign What I'm really trying to do is I'm trying to take Make a variable called oxford or not and assign the value oxford to oxford or not Which isn't really what we want to do. We want to say are these the same which is a logical question So the error message is actually useful Uh, usually in our the error message was just really painful and and are not helpful And so it says do you need the double equal sign? It's like ah Sure enough I do right so I can go ahead back Put in the double equal sign and get the right result Another problem is that sometimes I will leave out the quote marks around oxford And it will say object oxford not found Object the word object is ours way of representing what we think of as a variable or as a column in a data set And so it's trying to look for a variable called oxford, but we're not looking for a variable called oxford We're looking for a string or text That is the word oxford looking for the word oxford not the variable oxford And so that's why we need to wrap it in quotes And so we can wrap it in single quotes Or double quotes as we did before and we get the same result The key is that if we start with a single quote, we need to end with a single quote And if we start with a double quote need to end with a double quote. Okay Great So the double equal sign again means Is the search that the left is equal to the right and we get a true or false from that Alternatively We could do exclamation point equal sign And so whenever you see the exclamation point that in logical terms is the word not So oxford or not not equal to oxford And so if we pipe this to count on oxford or not What should we get as our output? Can you guess? So we're going to get Non oxford and the 480 responses that came from that. Okay Awesome. So there's a lot of other logical operators that we can use things like greater than less than Greater than or equal to less than or equal to But our data as I mentioned earlier is all categorical. It's all text data And so we don't have any quantitative data numerical data to test out those other logical operators We'll have to wait for another data set in a future session to work with those but This gives us a great opportunity to try to build more complicated queries in our filter function argument So the first thing that we're going to think about is well Maybe I want to know about people that are really Hantic about language that they use the x or comma and they think data should be plural Okay, so there's two things there and we want both to be true So with what I've already taught you about the filter function You should be able to figure this out or at least a a preliminary way of figuring it out So what we're going to do is get hub We're going to pipe that to filter and do exactly what we've already done oxford or not equals equals quote oxford So this gets into this data frame with all the respondents who did the oxford comma, right? We can then pipe this And do another filter function Or we say singular or plural Equals plural The other thing I forgot to mention is that it is very case specific, right? So I started to type capital p plural. There is no capital p plural in this data set So nothing would have matched Right, so you have to you have to spell it right and you have to get the capitalization rate as well Okay So now this returns for us a data frame with 135 rows of people that use the x or comma And think data should be plural, right? And so we can double check that by adding to the end count oxford or not comma Singular or plural And we see like sure enough we get 135 rows and they're all oxford or plural Okay So syntax and syntax this works, right? We get the right answer That's all I care about But sometimes I care about more sometimes I want my code to be a little bit simpler not as verbose So what I'm going to do is I'm going to show us Show you all how we can combine these two filter functions into one to kind of tidy up the syntax of it So there's two steps. There's two ways that we can do this. So we can do github and pipe that to filter And then do oxford or not Equals equals oxford spell it right comma Singular or plural equals equals plural And I'm going to go ahead and add this count, right So what you should see is we get the same output using putting the two logical questions together In the same filter argument set, but we're separating it by a comma what this comma says is that this expression for oxford or not has to be true And this singular or plural question also has to be true If either of them is false, the whole row is excluded from the new data frame that we're going to generate Okay So the comma is nice, but uh to me It doesn't make as much sense as the other option that we can use which is an ampersand So I'm going to replace this comma with the funny character over a seven Which for the life of me, I cannot draw by hand. I have to use a keyboard to generate that ampersand I think normally I do like a plus or something right if I'm writing by hand So that and ampersand is the word and right So if we run that sure enough, we get the same data frame back that we had initially right Great one of the other reasons I like to use the ampersand Is that it works well as a compliment to its opposite the or operator So we want to know well, what what is the demographics of people that use the oxford comma? Or that use data as a plural now Well, we can replace that and character With a vertical line, which if you look at your key to the right above the return key, that's a backslash If you hit shift backslash the character that's generated is this vertical line Which in r is expressed as or So is oxford or not equal to oxford? Okay, if it's true That's good. If it's false. Well, we'll keep that in mind If singular or plural is plural. Is that true or false right? So if either thing on the left or right of that vertical line is true We're going to keep that in the new data frame we generate if either of them are fault or if both of them are false We will not include it. Okay, so if either is true if both are true We keep it if neither is true. We're going to throw it away And so what you see then is that we now when we count tabulate the number of cases of each That we don't we get non oxford and plural oxford plural oxford singular oxford na Right But we don't have a non oxford singular Because non oxford here on the left would be false And singular here would be false. So both would be false, right? So those get thrown out and they're not carried on into the next data set With me great So I want to take it one step more complicated So sometimes I might want to mix And questions with or questions, right? So I Say like give me the the rows from females of people that Use the oxford comma or use data as a plural now, right? And so it gets a little bit more complicated. So if we do this get hub Pipe that to filter And so we want gender equals female I think it was capital female And right. So I want females and Those who use one or the other, right? So I could then do oxford or not equals oxford Singular or plural Equals plural And then i'm going to count oxford or not and Singular or plural, okay? So Looking at this syntax I'm a little bit confused. What goes first, right? I think I think it goes left to right, but i'm not i'm not really sure Right, so if it goes left to right, it's going to say give me gender equals female and so females that use the oxford comma or People that use plural, which isn't exactly what I want. I want females Who use the oxford comma or? plural So whenever i'm not sure about the order of operations what I like to do is wrap the statement in parentheses So if I go ahead and wrap this in parentheses I now get what I want, which is oxford or not and singular or plural right I suppose I could also do gender as a column just to prove to myself that everybody's a female And sure enough, I can also just demonstrate what would happen if I didn't have these parentheses That I should get a male column And sure enough within gender I do have male rows. So again those parentheses are really important for directing how the calculations great So I think pretty much everyone is back. So hopefully you had a good discussions in your breakout group to talk about the different exercises What I've I'm going to go ahead and reshare my I think I'm going to share my R studio screen I think this is sharing my screen now So would anybody like to volunteer How they went about solving The first question of which geographic region was the best represented in the survey? I can I can go ahead Sure Do you want me to do you want to share your screen so you can show us? Yeah. Okay. Let me do that. Go ahead Is the the font size is big enough or Looks good. Okay So I guess I was just it was just count for the location And then it will show the different location and the number of the respondents from there I was actually thinking of sorting them after but I don't know if this is Is there a specific function? I diverse for that or not? There is that's great. That's great question. So at the end of your line 85 You could add a pipe character And then the function not sort but a range Oh And then if you put in parentheses n N Yep, like the column name you have there now if you run that oh, yeah, I see. Okay. Yeah So that's a ascending sort So if inside of a range To the left of your end there Or the left side Yep, and then type d e s c Like the first four letters of descend and not a comma but a parentheses Uh, oh, I see and then closing parentheses Yeah Oh, now it's a descending sort That's cool. Yeah, good job. Good question and Wasn't too hard to add Yeah, great. Um, but someone else like to show Or what what someone else like to show the second question of how many respondents cared about grammar Don't need to be bashful. All right. I can I can help out with that Um So again, my thought process Um, well, we talked about this before of importance of grammar That we could perhaps say github um And then we could say If that to count importance of grammar This shows us the different categories So i'm going to think that the people that thought it was Somewhat important or very important were who I want to focus on So i'm going to then do github filter Importance of grammar Is somewhat important Okay, school. Can you share your screen? Uh, yes. Thank you. Thank you very much. No, thank you for saying something Yeah, thank you All right So hopefully that looks better now um So as I was saying, uh, the first thing I would do was I would count the number of different cases or different values in that importance of grammar column And and so then I got uh, Neither important or unimportant neutral Somewhat important somewhat unimportant very important or very unimportant So something to notice is that these columns are alphabetical And then the values so what i'm going to do then is i'm going to filter importance of grammar equals equals somewhat important or importance of grammar equals equals very important And so then this returns a data frame that has a 1021 rows Okay um And so then the question is among those respondents that cared about grammar Did they have a preference for the oxford comma or using data as a plural now? So i'm going to copy this line for my line 32 down And at the end of it, i'm going to put a pipe and i'm going to count um oxford or not Okay So again previously we counted perhaps the number of times Uh, people said somewhat important or perhaps said very important But now i'm going to i'm going to count on a different column the oxford or not column And so this tells me that among the people that thought Grammar was important, which hopefully is everyone 437 of them thought So 584 used the oxford comma 437 did not use the oxford comma Okay So um and again just another point that in this tutorial today I've been writing these as kind of one line Series of commands one line pipelines But if you've got a pipe then at the end of the line you can at the end of the pipe character You can put in a an enter to get a new line And so now we have we make it clear that there's there's The data github going into these two different functions And if we're in up here in our r script We can click run and it will run those three lines for us to get the same output Okay, so um they've Question that I came up with was whether or not people that were older Were more likely to use the oxford comma or not Okay, and so i'm going to look at age So to github pipe that to count age And what we see is that the age wasn't a numerical value It's a categorical variable, right? So they put people into age ranges or buckets based on their age So i'm going to define people that are basically 45 and older as being quote old. I am not there yet So I will then do github And I will then do filter And I will do age equals equals 45 to 60 Or age equals equals greater than 60 So another classic error that I make I should have a space there Um is that I might put and in there right because I want the people that are 45 to 60 and over 60, right But what that does is that requires that the age column The age variable have both of those values and that's not the case But nobody can be both between 45 and 60 and over 60 So we need that vertical line that says or so we want people that are between 45 and 60 Or that are greater than 60 And we can make sure this worked by count age And we see that we had 272 people greater than 60 290 between 45 and 60 but I want to know uh the Usage of oxford comma or not among this older demographic. So instead of age. I'm going to do oxford or not and so I see that among this demographic 297 people did not use the oxford comma And 265 did use the oxford comma and so we might say well, what was like the overall oxford comma usage like again? So we could do github And then pipe that to count oxford or not and so we can see that You know the there's more people in the whole study set that used the oxford comma than that did not But among our older group more people used the non-oxford comma than did use the oxford comma So if you want to stay young use the oxford comma, that's probably not quite true. So anyway Did anybody else try A different type of question or what kind of questions did people able to get to thinking about other questions you could ask I'm sorry. I have a question about something So if I'm filtering so for example filtering the age you filtered for I guess The 45 to 60 and above 60 Do you have to mention In the code itself the the the column again, or can you just have a pipe or and Without the call It's required in in the function, right? Yeah, so if what you're asking is if can we do github and then pipe that to filter age equals equals 45 to 60 And then or greater than 60 so this Will not it doesn't work in all right, okay, so you have to you have to say age Equals equals greater than 60 for it to work Even if you have it as a I guess as a like a c parentheses if you kind of have them Concernated somehow like age equal equals c parentheses And inside you have the 45 to 60 and then comma above 60 So you're saying if we so we'll see how this goes if we do x equals age equals So c Or yeah Yeah, I'm not uh Yeah, it's probably this is probably the easiest way to do it. I'm not sure I think I See where you're trying to go, but It's too much like something I mean you could do would be to say um You'll say you didn't know 45 to 60 say you had like x is 45 to 60 Could then do like github filter age equals x So if you do that then we should get all the um, oh I have to Do that And if I count age See that we've got all the people 45 to 60 But as far as like making this it's unvariable. I'm pretty sure you can do it But I don't know how to do off the top of my head. Um, and it's probably Probably it's one of those cases where you can do it, but um, It's probably harder than it's it's worth worth trying for. Okay, I see Great any other questions or comments or thoughts We we had the different approach which was either as unsuccessful or it was Successful but weird. Okay. Um, we wanted to see the uh socioeconomic Impact on the usage of the oxford or not So I can share my screen. Yeah, let me stop sharing mine and you can show yours great Okay, so Can you see a screen? Yep, great Uh, so we wanted to filter out the graduate degree In comparison with a much lower degree. We tried them all But it was we would only get a graduate degree using them Uh oxford or not. So you see I've hushed out stuff that we had on the Yeah, so can I give you a hint? Yes. So I'm lying 112 to the right of your vertical line You have importance of grammar rather than education Oh Thank you. Yeah, it's always the little things Um, oh wait, sorry. So Oh, that was a problem Okay But that's great. I mean you the the syntax is Right, the hard part is kind of thinking of the question and then laying it out How you would want to answer it and you've got that right like the syntax of the typo I mean professional programmers do that but the hard part. I mean, I'm not Lying to you here. Like the hard part really is thinking about the logic and the flow of You know asking a question and thinking about how would you answer it and you did that you just got one word wrong So, that's great. That's good. Thank you Well, this is the top of the hour and I really appreciate you coming back this week If you have any comments or suggestions feel free to drop me an email It's at umich.edu I want these to be useful for everybody as much as I can. It's hard to kind of serve everybody's interests But we'll see what we can and I want to keep these interesting for people and I like doing them. So hopefully you like Coming back and doing them. So with that, thank you. Stay healthy. Stay safe and have a great week. Take care