 So we'll go ahead and get going. For those of you that haven't met me, my name's Pat Schloss. I'm a professor at the University of Michigan. I live outside of Ann Arbor, so I'm coming to you from my Dexter office here outside of Ann Arbor in Dexter, Michigan. We have chickens and turkeys and pigs and sheep and cows in the backyard here. So I don't know that you can see into my back window here, but you might hear a turkey gobbling or something. My six-year-old has some pet turkeys. So one of the things that's been interesting the last couple of weeks with doing these Zoom calls and blue jeans or FaceTime or whatever is seeing everybody's pets kind of wander across them during these talks. So don't be surprised to hear some farm animals in the background. So thanks again for your interest in hanging out for an hour or so. My plan is for this to go an hour, but if we have some kinks and it goes over a little bit, I hope you don't mind. The idea of this is what we in my lab call a code club. And this is something we've been developing over the last few years as a way to help people in our group develop computational skills and learning to program. And so we have people that joined the lab as really great bench scientists and they need to kind of progress to feel pretty sufficient with using R and mother, of course, and all sorts of other tools from the command line. And so this has been something that we've iterated over the years to get to a point where people feel really comfortable in programming. And so I've thought for a while about, well, how could I deploy this at a larger scale with people that aren't necessarily in my group? So my group has anywhere between six and 10 people at any given time. I see we're up to 34 people. So this will be a very interesting experiment here. One of the goals also of this is that I get the sense from people in my own group and people I've talked to is that these days of working from home, working remotely, you know, we don't have experiments to do at the bench. We don't necessarily have all the same meetings to go to that people kind of need some type of structure. And so if I can help provide that, then that's a bonus, right? So my plan is to hold these Thursdays at three. And again, depending on the interest, I mean, if people want more of them or less of them, then I'm happy to adjust as well. The other thing that I also want to do is kind of help to overcome some of the social isolation. One of the nice things about Zoom, and I think some of the other platforms do this also, is that I'm gonna be able to put everybody into a breakout group. And so I can put people into groups of two or three. And then within your group, you can talk about the code and you can solve the exercises I'm giving you together. And so in that way, what we found is our research group is that we might have more experienced people that help the less experienced people. But we also set it up in a way so that one person is typically typing and the other person is telling them what to type. And so you might be some great computer programmer, but when your hand's on the keyboard, your mouth is sealed. The junior person, so to speak, is gonna tell you what to type. And so it's really important that people adhere to that. This is not a competition where you have to get through all the exercises and you have the best code. The goal here, again, is to break out of kind of our social isolation, talk to each other, get to know people from around the country, around the world maybe. And also to have perhaps pick up some R skills. There's a lot of resources that are on the website that we're hosting this through. Over the years I've developed this minimal R tutorial using R to analyze 16S Amplicon data. We've got on there a tutorial on reproducible research practices. We've got this code club. And then next week or the week after, I'm gonna start trying to do some live streaming of data analysis projects. My goal with this also is to keep it light. I don't wanna analyze COVID data. And I also don't really wanna analyze microbiome data and get all serious about everything. There's enough seriousness out there. If I can help provide some levity, then again, that'd be great. Looking at my notes here. Also at the end of this, I will post the video to YouTube. I will also post the solutions. So if you have to drop out or if you maybe can't make it next week or you have a friend that wanted to participate, I'll let them know that the resources will be available to them later. The other thing that I try to do with materials I present is to not make it so much of a tutorial. I wanna know how to make a dot plot or scatter plot. I'm not gonna tell you how to make a scatter plot. What I'd rather do is say, I have this question. And how will we answer that question? And so I mean, there are a lot of resources out there for the kind of how do I do X? But there's not a lot of resources out there for how do I answer a question? So that's kind of a niche that I try to fill. And I mean, I think there's many ways to approach all this. So here in a bit, I'm gonna break you up into groups, as I said. Hopefully this works. I see we're up to 40 participants. That's just really awesome. I'm really impressed. So when you get into your part, your group, I'll be sure you introduce yourself to the people you're working with. Tell them who you are, where you're from. You don't wanna work with a stranger. And if you come back next week, I can't promise that I'll get you back with that partner. So maybe you wanna exchange some contact information just to keep in touch with them if that's something that's interesting to you. So there's a big rule that my lab has really found to be the most important thing in all this when we're doing these types of code clubs is do not be a jerk. We know that there are people at varying levels of proficiency in using R or using computers or anything. I do not make your partner feel bad for not being as proficient as you are. Lift them up, right? Help them out. My hope is that if you're somebody that's not as proficient or more of a beginner, that you'll learn perhaps from more experienced people. If you're a more experienced person, the best way to learn something even better is to have to teach it to someone else. And so do not be a jerk. I have a code of conduct up there. I really hope I don't have to enforce, but I just can't emphasize this enough. Do not be a jerk. We're here to escape just all the crap that's out there in the world and just don't be a jerk. I will, as we go through, I will use Zoom to do the breakups again. There's a feature in the breakout rooms where you can, I think, raise your hand and I can duck into your group and give you information. You can also share your screen with your partner. That might work well, but at some point I'm gonna have you switch who's in control. And so it might be that partner one has exercise one on their version of our studio and partner two has it on the second example on theirs. There's a way to control the other person's screen, but I understand that people might be weirded out by someone else controlling their screen. So we'll just have to experiment and see how that all goes. I'll send out a message reminding you to switch and I'll also send out a message kind of giving you a five minute heads up about when to come back. At the end, I'll try to build time in so that we can come back and debrief with so many people. Unfortunately, we just, I don't think we're gonna be able to get everybody to give feedback. So that's kind of where we're going. We'll hopefully figure this out together and we'll learn some are along the way. So before I get going too far, do any quick questions about logistics here? All right, hearing none. So today, hopefully you've found the site about an hour ago I updated with the exercises and the data that we're taking is some survey data that 538 collected looking at people's preferences for different types of candy. And let me share my screen with you just so I can kind of show you what I'm looking at. And so this is the page for the Candy Crush Code Club today. And you'll see there's three exercises here and there's a homework exercise or kind of a stretch goal for other people to work on. The code or the question is coming from a 538 article which they call their ultimate Halloween got candy power ranking guide where they collected information on, forget how many, but a good number, maybe 85 or so, 86 different types of candy. I don't know anybody that loves good and plenty. I think my mom likes good and plenty. Anyway, and they collected different characteristics on the candies and his goal, the author of this was to kind of build kind of a Franken candy of what would be the perfect candy that everybody would love. And of course, that's kind of a silly question. I have a very different set of questions. And so that's what I'm going to lead you all through with these activities. So I'm a big fan of what he called Pluribus candies or Bite-sized candies like M&M, Skittles, Hot Tamales. I was really bummed that Hot Tamales did not make it into their survey. So with all the schools closing and everything, I have a daughter who goes to Michigan Tech up in the upper peninsula of Michigan. She had about $100 left on her dining account that she could spend anywhere. And my good daughter got me about 50 bags of peanut M&M. And so I'm basically sitting back in my room these days doing work eating peanut M&M. So when I go back to campus, I'm probably in pretty bad shape. So hopefully not really. So my question is, well, what other candies are like peanut M&M's? But I also like Skittles. So trying to think about bite-sized candies and what are the factors that people like about bite-sized candies and kind of what are the demographics of bite-sized candies and things like that. So that's what I'm going to have you work on is that I've got three questions for you here in exercise one, two and three. They kind of build in complexity a bit. This first one, I've given you a chunk of code that counts the different characteristics of these pluribos of bite-sized candies. So this chunk of code works. You'll see that if you know R, that this pound sign to the right here is a comment sign. And so your job is to fill in what that comment, what that line does. If you copy this code chunk into our studio, you can then run these two lines to figure out, say, what the filter line does, what the filter function does. And you can kind of expand down to understand what's going on. And so my suggestion is that to the right of these pound signs, you then indicate what that line of code does. Exercise two is I've taken another set of code chunks where I generated a strip chart comparing people's preferences for chocolatey, like peanut M&Ms, regular M&Ms and fruity bite-sized candies. And so I want to know what do people like better in terms of bite-sized candies? Chocolatey candies or fruity candies? I like both M&Ms and Skittles pretty well. And so there's code here that generates this strip chart. Unfortunately, I hit the sort function on my studio editor and I've now alphabetized my lines of code. And so it's your job to take these lines of code and break them up into the three different code chunks that result in building the plot. So all the codes here, there's nothing you have to add, there's nothing you have to take away. You just have to figure out how to unjumble my code for me. And finally, now that you've unjumbled that code, I want you to modify it to answer a different question. Mainly, are chocolate candies more expensive as a bar or as a bite-sized piece? And so hopefully this makes some sense. And again, hopefully you were able to get the setup working. The first step that I'll help you with is if you right-click here and copy these two commands. In our studio, I'm going to make a new script, RScript. And I'll copy that in here. And so to run lines of code, I can highlight them and hit this run button. And this will then load the data from 538 into my R console, okay? And so from here, you should be off to the races. And I see that there's a comment. Ah, sorry, let me share with you my RStudio. I should just share my desktop, let's do that. All right, so hopefully you can see my RStudio now. Sorry about that, thanks for bringing that up, Matthew. And so again, if you copy and paste those two lines into here, you can highlight it. You can highlight all these lines with, say, your mouse and hit run, and it will then run that, okay? So before I send you off into breakout groups, are there any questions that people have? You're a quiet group, all right. So at this point, I'm going to, let's see, I'm going to now break you up into, let's see, we're still figuring out how this all works. Just give me a second here to stop, okay, stop share. And so what I'm gonna do now, I'm gonna go ahead and split people up into breakout groups. It looks like we'll have groups of two, and maybe a group of three or so. And I'll give you about half an hour to work through these three exercises. I think there's enough there to play around with. Again, try to understand what's going on. And I think these exercises will help you to figure out what's going on in these various lines of code, okay? And hopefully that worked well. I popped in on a few groups, and it seems like people are having good conversations and just wanted me to go away. And hopefully you enjoyed talking to someone else about these exercises. So what I'd like to do in the remaining time is to come back and maybe have a couple of groups share what they did for the different activities. So I wonder if somebody would like to volunteer to tell us, I can share up the, let me share with you my RStudio window. And it would be good to have somebody, if someone would like to volunteer, you could raise your hand I think, and maybe talk us through what you added for comments for exercise one. Any volunteers wanna raise their hand out there? Sure, I can talk about what my group discussed. Okay, and who is this? This is Gavin. Okay. So essentially we mainly focused on the first exercise because we weren't too familiar with the syntax, but it looks like the first line was for piping the table to the subsequent steps. And then restricting the rows to be these pluribus candies only. And this pivot longer command, this is for converting from a wide column table to one of, a longer one of many rows instead. And with a new column, so this is a, I'm not sure exactly how to succinctly summarize with one single on comment, but to, with new column type and new column answer with their values. The next line, only keep descriptions that are positive. Or true, we might say, right? Yeah, yeah. And group rows, the next one, buy their types for the subsequent analysis. And then that subsequent analysis is to get the number of characteristics that are distributed across all candies. So yeah, the instances of the characteristics. Okay. Great, good job. So the comments on the syntax. So this is, if you're, we're using a library up here called the tidyverse, our package called the tidyverse. And it was a set of functions that are built around this idea of tidy data where you tend to have longer tables than wider tables. And so this, the syntax of this funny character of the percent greater than percent is called a pipe. And you can think of this as piping data through these various functions. So if you're able to get through that, that's great. And if not, no worries, you'll get there, trust me. And one of the things, I personally like the pipes because otherwise you're writing one command after another. And here, at least for me, the way I kind of make my own mental model of how this works is again, with that kind of flow of data through these various steps. Great. Any other comments about exercise one? Okay. Yeah, Jacob. All right. One thing that actually says one that we noticed that we were wondering what would happen if we took the group by column out. Okay. And we found that then summarizing total sum gives us a, gave us just a total number of candies total that were, that were pluribus. And that was 96 or something. Cool. So let me, yeah, let's try that out. So if the nice thing about our studio is if I have my cursor in this code chunk, I can then hit run and it will run that chunk of code down below, right? So the way it was written, we get the count for each of the different type of candies that also were pluribus. And so you said that you removed group by. So what I could do is I could, I think put in a comment sign to comment out that row and then I can again put my cursor in there and hit run and then I get a total, right? So you get the 98. And so, yeah, you're right. Like that's the interest, this is a good way to figure out what the different functions are doing in the pipeline. And the group by, if you were to look at the output of just getting through the group by, you wouldn't really notice much difference. But what it is doing is it's taking your data frame and it's breaking it up into those different groups by typing. And then within each type, it then summarizes to generate that total column in the new table that's generated. Now I've somehow created a bug somewhere in here. There we go. Great. So exercise two, would anybody like, I think what we might be able to do is I could either have, we could have somebody share their screen. Yeah, so Peter notices that one funny thing about this approach is that summarizing is that some candies like gobstoppers might be counted three times, right? And so that some candies hit multiple different things. So like, I think like a Snickers bar would get like peanuts, chocolate, and you'll get. And so, yeah, if you're doing like statistical analysis, you couldn't necessarily do a head-to-head comparison because not everything's independent. Would somebody like to share their screen to show exercise two? Katherine, are you volunteering? Yep. Awesome. Okay, so I'm just opening it up, our studio. Okay, is that sharing properly? It looks awesome. Okay, perfect. So this was a bit of a mad dash for our group because we had a bit of a late start, so the countdown had started. And the way that we went about it was first by looking for a line that had the candy data set in it because that is our starting data set and it is being assigned to the new data set, Pluribus data, and that went to the beginning and then the next step was finding anything that looked like it was involved in plotting this data, anything that's Gigiplot or Geom or the scales or anything like that and putting it to the end and that really reduced the number of lines of code that we had left to sort out. There was a little bit of confusion because the filtering specifically only looked at chocolate and fruity here and we weren't sure whether it was supposed to be all Pluribus candy or not so we were seeing another filter line of code but since there wasn't, we went with this one. And then similarly to the first exercise, the same pivot longer, just slightly different and then filtering down to where it is true. Shall I continue? Yeah? Okay, so for any questions so far, anything unclear? No? Okay, so for the next little chunk, it was just summarizing, so again very similar to the first exercise, taking our new Pluribus data frame and grouping by type and summarizing very similar to the first but this time showing the median win percent and the IQR and then taking that data frame again and putting it into Gigiplot to show, I guess I could just, I started on the third one but I'll just run that again to show on the x-axis the type chocolate or fruity on the y-axis the win percentage and then coloring it by whether it was a hard candy or not true or false and then GM jitter is next to not have the points all in a single column and then setting the x-axis to break by chocolate or fruity and labeling it and setting the y-limits to be zero and 100 for the percentages and then adding labels and the title and the theme. Excellent, well done. So that, I think that looks about like what I had intended. So one thing that you all you commented on up on line 19 was the filter of the syntax that Pluribus and and then in parentheses chocolate vertical bar fruity and so filter takes, it looks at the insides and what it's hoping for is to get a series of truths and falses and if it's true, it keeps the row. If it's false, it basically throws out the row. And so what this is saying is, is Pluribus true and are chocolate or fruity true? And so what you'll get back is all of the Pluribus candies and all of those that are either chocolate or fruity if that makes sense. And so you can again use that ampersand to indicate and like a logical and or the vertical line which is above your return key and below your delete key on your keyboard, that vertical bar is sometimes called a pipe but that gets confusing because we've got this other pipe here in R but that pipe also means an or, right? So this or that, is this or that true? And if that's true, then it's true whereas and both things have to be true. So this process of using filter and group by and summarize can be really powerful for manipulating these big or this isn't that big but you know, big data sets and things like that. So good job. So let's see, great. So then this brings us to the third exercise which let's see, losing myself in shares and shared screens. Let's see. So would anybody like to, did anyone get to activity exercise three and would like to share what they did to effectively take what you did in exercise two and adapt it for making this other type of plot to compare chocolate that's either in a bar form or in the Publius form. Any takers? So it sounds like people didn't maybe didn't quite get that far. And so what I'm going to do is cheat a little bit. And so just so you know, the solutions I've put at the bottom and in case people have to take off there's a survey that would really love for you all to take. But if you click on this button to show the answer I'm going to take the code for exercise two here and paste that into exercise three. So this set of code, these three code chunks is what I think Catherine just I've just shown us. Why doesn't it like me? There we go. And so as we saw, yeah. You're not sharing your screen. Thank you. This is like the coding version of your unmute. Okay, sorry. So I've copied over exercise three and I'm just copied over exercise two into exercise three and you can see the plot that I made. That's like what we just saw. And so the question was how can we modify this code to look at barred candy versus bite sized candy that's chocolate. And so I learned the code and I try to teach people like code by doing this type of exercise where you take something that you know works or something else and then modify it to take on a different question. And so what I'm going to do is I'm going to change Pluribus to chocolate and then chocolate or fruity to be bar or Pluribus. Right? And so then this will, this will, this line will return all those rows that are both chocolate and bar or Pluribus. And then my pivot longer, I need to change my chocolate to be bar or fruity, bar or Pluribus. And so this will get me my, we'll call it chocolate data. And so if we look at the chocolate data, we see all the rows are true for chocolate and it's kind of scrolled off the end, but it also is all the things that are Pluribus or chocolate somewhere in here. It's in the type and answer column is the Pluribus and the bar. So I can then count that using the same code that we'd used before except instead of Pluribus data, I need to use chocolate. And I see that there were 20 chocolate bars and 12 Pluribus bars, 61.5 is their win percent. But I think I'd asked for a price percent. Let me just change that really quick. And so we see that by the median, the bar chocolate tends to be a little bit more expensive than the Pluribus chocolate. And if I then think about modifying this chunk of code to build a plot, I can do chocolate data. I'm gonna change win percent to price percent. And for color, I think I will make that peanutty almondy. And that and my breaks will be bar or Pluribus. And I'll change my labels to be bar and Pluribus. And I will say, this is the price. I'll just call it an index. And then I'll say the title based on this plot and this table output here was that chocolate bars cost more than chocolate white-sized candies. And so hopefully this works. And something I noticed is that my y-axis was scaled from zero to 100. And I'm thinking that I've done this before with my own data. I bet this goes from zero to one. And so what I'll do is I'll change this y-limb from zero to 100 to be zero to one. So I rerun that, it expands, and we then see the price index percentiles for the bar versus the white-sized candies. And then it's colored by whether or not there's peanuts and almonds. So again, these solutions are here at the end of the day session. If you weren't able to get through all exercises, that's perfectly fine and understood. I tried to give a little bit extra, just in case some people were blazing through things. I also didn't know what to expect in terms of how long it would take people to do things. So we'll get better with time. One homework exercise that I would assign for you, it's not great. You don't have to do that. I'm not great at it. You don't have to send it to me. It's to build your own strip plot using data from the candy data data frame and see what you can come up with for a question that's interesting to you. Again, I'm happy to answer other questions, but I want to be sensitive to people that have to leave, but there's a link here to go to a Google Form Survey. You don't have to leave an email, it's anonymous, but I would really appreciate getting a sense of who's on here and information from you. So I'm going to stop sharing this and see if there's other questions. So somebody asked, are Boston baked beans some kind of candy or just regular baked beans? I guess it's a candy. A candy covered peanut. Okay, so is it like a peanut M&M perhaps? Okay. Any questions that came up about the code or about discussions that you maybe had in your group? All right, so this feels a little fast, but hopefully you've got something out of it. If nothing else, you're able to interact with someone else and talk to them about code. Those of you that have taught, and I've said this at the beginning, that if you have to teach somebody something, you're going to learn it that much better. It's also a great way for people just starting out that if you can explain something back to somebody that that really shows real growth and that you're really learning the content and that you're making progress. And so that's something that's great. So my plan is to come back next Thursday, and same time, same place, and hopefully I haven't eaten too many more peanut M&Ms and we'll come up with a different data set. Again, please answer that survey. One of the questions in there is to get a sense from you of what are the things that you'd like to learn about and what are some of the other things that you'd like to see in our. I have things that I can come up with, but that doesn't mean they're interesting to you. And there's some other fun 538 data sets that perhaps you could explore or other things we might decide to look at. All right.