 I believe you can answer your own data analysis questions, do you? If you do, stick around for another edition of Code Club. I'm your host, Pat Schloss, and my goal is to help you to grow in confidence to ask and answer questions about the world around us using data. This week, we'll build upon our recent experiences analyzing the data I've been recording for the past five years from my local livestock auction where they sell sheep. We'll add a new function, PivotLonger, from the TidyR package, to see whether there's an ideal time to market each class of lambs or whether they all track each other. In last week's Code Club, we saw that large lambs are more valuable in the spring and that if aged sheep have any value at all, it's typically en-jannuant. But what about the other classes of sheep? As a producer, I have the choice of determining when to take lambs to the butcher. Here in southeastern Michigan, there's a large Muslim community that eats a lot of lamb. They generally prefer lambs that weigh between 60 and 80 pounds. Other buyers prefer large lambs to get more meat per cut. All of this means that I have many options for how to market my lambs. Since we produce lambs throughout the year, I'd like to know whether there is an ideal time to market each class of lamb or whether they all track each other giving more value in the spring. That's the question we'll answer for today's episode of Code Club. For today's Code Club, we'll build off of what we have done for the past few weeks. Don't worry if you missed those episodes, we'll start with the block of code that we used last week and see how we can modify what we did for large lambs to look at all lamb classes. To do this, we'll need to learn about the PivotLonger function from the tidyR package, which is part of the tidyverse. This function is related to gather, which is the older version of PivotLonger. Know that if you see people mention using gather, that it's okay to keep using that function, but the package maintainers are encouraging people to pivot to PivotLonger. As always, don't watch the video straight through without first firing up our studio and trying the code and exercises yourself. If you haven't already followed the setup instructions, go ahead and pause the video so you can go over to the Riffimona's project website for this Code Club and make sure you have everything you need. I've gone ahead and copied the code from last week that we were working with into my session of RStudio and saved it as an RScript file. Know that if you look at the notes down below, there's a link going back to the Code Club page for today that will have the same code presented there for you to copy and paste into your own RStudio. So again, if you want to go ahead and grab that, pause the video, go grab that, and then come back and we'll get going. All right. So the code that we had, more or less from last week, plotted the prices for the last five years of large lambs. So what I have is that code, and I just want to briefly run through that code with you all. So we're all up to the same page. So you'll recall that we used Library Tidyverse and Google Sheets so that we could read in data from my Google spreadsheet. And this is the long URL. Definitely don't want to type that in by hand. So be sure you're copying and pasting that over. So I'll go ahead and run these lines, loads those. We'll be working with this auction data data frame of fair amount today. Again, what it does is it reads in the Numbers and Prices page from that Google Sheet. We look at columns A through J. We format it to have the first column be a date, the second column be the number, and then the rest of the columns are character types. We denote an NA value in the workbook as a NA rather than the default of a blank space. Because my column headings were mixed cases and had spaces, we need to do some formatting on those. And so that's what this rename all and rename function do here. As we run this, we get a prompt for figuring out which account we want to read in from. If you've never done this before, it might pop open a browser window to have you do some authentication there as well. So we'll go ahead and press 1 to use my Gmail account. It's reading it in. You only have to do this one time per session. And again, if we look at auction data, we see we get this data frame, again, where we have the date, the total number of animals, and then these different weight classes of animals going across. We notice also that the individual for the date and class of animal, there's numbers separated by hyphen. And these numbers are the lowest value and the highest value that animals sold for in that weight class for that week. So it's 160 to 200. The 200 means it's $200 per 100 pounds of weight, so $2 per pound. I don't know why they do 100 pounds, maybe so they don't have to deal with decimals. Who knows? All right. So to separate that apart, what we needed to use was a command called separate, which again, we learned about last week. So we'll go ahead and run this. And when I look at my large prices, that I get a date column, the min value, and the max value. So you'll note we also used a select function to focus in on those three columns. And then finally, we built the plot and fitted a smooth blue line through the data. And we talked about putting different titles and labels on the axes. So this is where we're starting from. Now, this is for the large animals. We also made one for aged sheep, so the older sheep that are past the prime and going off to become dog food. We'd like to look at the different classes of lambs in this week's Code Club. So I want to look at the small, medium, large, extra large classes and thinking about how do we get four lines across this plot. And so to get there, we're going to come back to auction data and look at what this data frame looks like. I'll make a little bit more room here. And so what you'll see, again, we have our date, the total number of animals, and then we have aged sheep, feeder lambs. We have all the different categories. So this format doesn't really play well with the ggplot syntax that we're used to seeing. So with ggplot, what we typically have is each column is mapped to a different aesthetic, a different factor, a different component of our plot. So we saw last week, and in this plot that we just generated, the x-axis is from the date column. We had a min and max and midpoint to get the midpoint price on large animals. In the previous week, we looked at the total. So that went on the y-axis. So we've talked about x and y aesthetic. What we'll talk about this week is color. And so we'll map a column to the color so that each class gets a different color. And so what we need to do is we need to take each column and concatenate it below the other. So we can imagine having a column of date, perhaps that column of total, a column of class, and then a column of the min and the max. And so we'll build up to that. But what we need to do, let's think about how we want to change things is we want to concatenate different columns for different classes of sheep. So once we've concatenated all those columns, the next thing we'll want to do is to separate the min and max values into different columns. And then we're going to want to calculate the midpoint price. And then we'll want to plot. So those are the four steps. We have already seen these last three steps. What we're going to talk about today is how we concatenate this. And this is what's called pivot longer. And pivot longer will allow us to take our wide data frame and make it long. And in this process, we'll make our data what's called tidy. The idea of tidy data is that each column represents a different variable, and each row represents a different observation. And so the data frame, as it currently is in auction data, is not tidy. Each row represents a different date, but each column represents a different observation for each class. And so again, we want to put all our observations in rows. And our variables, again, thinking about what might we want to plot, are variables as our different columns. So to do this, as I mentioned, we're going to use a function from the tidyR package that comes to us through the tidyverse to use the pivotLonger function. And so pivotLonger, pivotUnderscoreLonger, has a number of arguments. The main arguments that we're going to use are data, calls, names, to, and values from. Values to, sorry, values to. And so this is the basic syntax that we're going to be using. As you saw as I was typing in the different argument names, there's a lot of other arguments that come up. And perhaps we'll touch on those in future code clubs, but I want to keep things simple because that's really all we need to do to generate the plot we want. So the data is the auction data. And as we've been doing, we're going to pipe this into pivotLonger. And then calls are the column names that we want to be pivoting longer. We want to be concatenating on to each other. So I'll make this into a vector. Let's do small, medium. I'm only going to put two in there for now to give this a test. And maybe I'll go ahead and do large. All right. And then names to is the name of the column that's going to have the names of our previous columns. So I'm going to call those classes. I need to put that in quotes. And then values to, I'm going to put this as price range. OK. So again, we're going to take, and I'll maybe put in some extra room down here, we're going to take the small, medium, and large columns. We're going to concatenate them into one single column. The column of prices is going to be called price range. The column of classes or column names is going to be called classes. And so again, if we run this, what we see at the right side of my data frame here are two new columns, so classes and price range. So we have small, medium, large. And you'll notice that we no longer have the small, medium, large columns in here. OK. So again, what we did was we took those small, medium, large columns, and we concatenated them on to each other. Now, I'd like to make a data frame that has all of these other classes, but I don't really want to have to type in all of those other class names. So we can set this calls argument in a slightly different way. So again, we could do auction data, pivot longer, and then calls equals. And then what we could do is we could tell pivot longer, which columns we don't want to pivot longer. And so we could do minus C and date and total. And then we'll again do names to classes, values to price range. So the difference between this function call and this function call is that here, we're telling it which columns we want. Here, we're telling it which columns we don't want. The advantage of telling it what we don't want in this case is that it's less typing. You could go ahead and you could type out all the column names, but who has time for that? So again, if we run that, we now get a very long data frame that has four columns. And again, thinking about each column as a different variable, we have the date, the total number of animals sold on that date, the class of animals sold on that date, and the price range for that class of animals on that date. And so this is what's called a tidy data frame. All right, so we've achieved this first step of concatenating columns for different classes of sheep using this pivot longer function. Excellent. The next thing we want to do is separate the min and max into different columns. Again, we talked about this last week, but we can do separate price range into, and we'll say min. I've got to make this a vector of values, min and max. And then I'm going to do sep on the hyphen. We talked last week that you don't actually have to put in the sep value if it's obvious, but I like putting it in just to make everything crystal clear. And sure enough, now we have two new columns replacing price range of min and max. I will then add a new column for the midpoint price that I'll generate with mutate. So we will then say midpoint. And this is going to be my min plus my max divided by 2. What is it doing? So it's complaining that error in min and max non-numeric argument to binary operator. And so if I look at my data frame, I notice a very subtle thing that min and max come in as characters after the separate function rather than as numbers. And we talked about this also last week that we need to use the convert equals true argument to convert the characters to numeric values. And we do that. Everything is good. We go ahead now, and we have min and max coming in as integers, and midpoint is a double. And so this represents the midpoint of our min and max values. So very good. And we'll go ahead and call this, let's call this tidy auction data. Excellent. Now we're ready to plot the data. So we'll go ahead and do tidy auction data into ggplot. This should look very familiar. aesx equals date. y will say midpoint. And something new that we're going to do is we're going to have the color aesthetic, as I mentioned a little bit ago. So if we say color equals classes, then that will effectively group our data by class. And each class then will get a different line. And we can then put this into geom line, open close parentheses, and voila. It tells us that it removed 270 rows containing missing values. We know that there's a lot of NA values in there. If we wanted to, we could go ahead and get rid of those. But for now, that's not really the point of our exercise today. So what we see is we get a plot showing all, how many? There's, I guess, eight different weight classes here, eight different classes of sheep, and their different values. Down on the bottom are those aged sheep that really aren't worth much. And we can kind of see some periodicity, some seasonality to this data, which kind of looks like things prices rise through the spring and then fall through the rest of the year. So what I'd like to do is to perhaps instead of looking at all eight weight classes, I'd like to instead look at the small, medium large and extra large animals to kind of focus on those animals that are more market class. Hair lambs are also perhaps a market class, but there's not a lot of them. The new crop, I'm not quite sure always what those are. Aged sheep are kind of a specialty. And feeder lambs are lambs that are typically not ready for market, not ready to be slaughtered, but are going back to be fed to gain more weight. So to focus on those four classes, we'll use the filter function. And we'll say classes equals equals small, or classes equals equals medium, or classes equals equals large, or classes equals equals extra large. And we want to be sure to add a pipe to the end of that line. And so now when we run that, we see sure enough that we have the four lines. The animals with the highest price are the small, followed by the medium, followed by the large. And actually the large and extra large seem pretty on each other. So I'm gonna come back down here and grab my labs and my theme function from last week to concatenate on to my pipeline here. And I'll change my title to say that, what should we say? Let's say something like that small lambs have the highest price, but all lambs peak in spring. I'm gonna go ahead and get rid of the subtitle and everything else looks pretty good. I'm gonna use that theme light because it has the lighter grid lines that kind of tell us where the years break. And so sure enough, that works. At least in my rendering of this plot, the title runs off the screen. So I'm going to go ahead and put in a line break, like a backslash N, and that'll get the title to wrap on itself, okay? So the new thing here from the previous weeks has been this pivot longer function, right? And so again, to recap the interesting thing or the syntax about pivot longer, so we give it the data, the columns that we want to pivot longer or to concatenate, the name of the column that'll contain the column names and the name of the column that will have the values in those columns, okay? And again, that's really the only new thing that we're putting into this pipeline. And so again, with very small changes to a pipeline, we can make a dramatic effect. And this plot, frankly, I find really intriguing to me. I didn't realize that small lambs were so much more valuable. I mean, I knew they were higher price, but I didn't notice that their prices kind of fell through the last half of the year. And also, if you kind of look a little bit closer, maybe I'm kind of looking too closely, it seems like the small lambs peak in price before the mediums, before the largest and before the extra-larges. One other thing you'll notice is that we lose the data for small lambs in about 2019. One of the exercises to follow will have you look at perhaps what class is replacing those small lambs. Is it the feeder lambs or is it the new crop category? Now I'd like you to engage with the material that we've discussed today to answer other questions that I have about these data. Doing these exercises will really help you to strengthen your skills in analyzing data. The first question is that on the second page my lamb prices workbook is the column called Holidays. Is the tab called Holidays? Sorry. And you'll notice that the rows here are the different holidays, the columns are the years, and then the cells contain the value of the date that the holiday falls on for that year. So you'll notice that Passover and Easter and the different Eid holidays and Ramadan vary over the course of the year where something like Christmas is December 25th every year. So what I'd like you to do is go ahead and take that data frame, read it into R, and make the data frame tidy so that there are columns titled Holiday, Year, and Date. Next, on the second question, the data that we made in the final plot was pretty noisy. What happens if you use GM Smooth either in addition to GM Line or in place of GM Line? Remember the different arguments that we used last week or in the previous weeks with GM Smooth to make it look a little bit more aesthetically pleasing and do a better job of fitting the data. Finally, I'd like you to create a plot comparing the small feeder lamb and new crop classes over the past five years. As I mentioned, the 2019, the small lambs went away from being recorded, and I'm curious which category most resembles that small lamb class in its prices, okay? So go ahead and pause the video. When you're done, play again, and I will show you my results to these different questions. Well, I hope you enjoyed working through these exercises, and even if you weren't able to get through all three of them on your own, I hope it kind of scratched some sparks of your brain to get you thinking about what we've covered over the past few weeks or perhaps what you've known from other contexts of learning are. So for this first problem, I'm gonna go ahead back up to where I defined auction data, and I'm gonna copy this read sheet component, and then I'm gonna modify that to read in the holidays tab. So instead of sheet numbers and prices, I'm gonna put in holidays, and I'm gonna go ahead and remove these lines because we use that A through J because there's some random stuff in column K, but let's read this in and see what we get. I'm gonna make this a little bit bigger so it's easier to see, and what we see is that it's reading in the column for the holiday, 2015, 16, 17, 18, 19, 20, and 21. It's reading in and in as a date time variable. So I'm gonna go ahead and change this to be call types, and I'll say C, and then we've got seven dates. So I'm gonna use the capital D to indicate the date format, and so we read that in, everything looks good. Now what we wanna do is we wanna pivot longer, and so I'll do calls equals minus C holiday, and we'll then do names to year, and values to date. So I'm gonna want the columns that we want to join on, the names, the column that the names of our columns that we're concatenating with what that column is called, and then the column name where our values or our dates is gonna go, okay? And so we run all this, and sure enough we now get a data frame that has the holiday, the year, and the date. Excellent, so we've mission accomplished on this. So the next question asks us, the data in the final plot we made was pretty noisy. You know, you're looking at this and it's pretty jittery. What happens if we use geomsmooth in place of geomline? So I will go ahead and grab this block, paste it in here, I'll run it again to make sure everything works fine, and instead of geomline, let's put in geomsmooth. And so those lines don't really look at all like the data we had. Perhaps to make it more clear what we had, we could also plot in geomline to get a sense of that. And I mean, the lines do go through the data, but it's not a very, it doesn't really track. So we talked last week about using span equals zero, smaller values of span will make the curve more wiggly. So I'll use 0.1, and sure enough then, we see that our lines do track the observed data. I'm gonna go ahead and remove geomline because it makes things kind of noisy, and I'll go ahead and do SE equals false because I don't want that gray cloud, again for my viewing purposes. And sure enough, we see a much clearer picture of what's going on, and at least to my eye, it does seem that those small lambs peak in value before the medium, which peaks before the large and the extra large. And you can kind of see that across all of the classes. And so again, what the next question we'll have us engage with is, after the small lambs went away in 2019, what were they replaced by? All right, so to answer that, again, I'm gonna copy this code chunk down for this third problem. And instead of looking at all these classes, I'm gonna look at small feeder lamb and new crop, and I'll get rid of the extra large category here. All right, and so if we run that, let's see, I think it's feeder lambs, not lamb. There we go. And so what we see is prior to them getting rid of that small class, the small and the new crop kind of tracked each other. It was really confusing because I would talk to the manager of the auction and say, what's the difference between small and new crop? And they sounded like about the same thing. So anyway, I think they got rid of it because other people were asking the same question. But you can see the feeder lambs bop around at a lower price point, but that the new crop and the small track each other fairly well. So I'm gonna update my title to say that small lambs or the small lamb category was replaced by the new crop category. And again, I have got that, wrap my title a little bit better, and there we go, okay? So know that with when we added that aesthetic for color that it's giving us some default colors. Here it's using a red, green, blue. And those might not be the best colors or your favorite colors. In a future code club, we'll talk about how we can customize those colors as well as customize this legend a little bit so that perhaps instead of being underscore feeder lambs, it could be feeder lambs, capitalized or title case, and same for new crop and small, okay? But again, we don't wanna overburden us with too much new information. The focus of today's code club really was looking at the pivot longer function to take a wide data frame to make it longer. And with that, making it tidy. Thanks again for joining me for this week's episode of Code Club. Be sure that you spend time going through the exercises on your own to help reinforce your new skills. Even better would be for you to take the data and tools we've worked on today to answer your very own questions. I'd love to see what you did. Please feel to drop a line in the comments below to tell us what question you were eager to answer. Also, please let me know what types of data analysis questions you have and I'll do my best to answer them in a future code club. Please be sure to tell your friends about code club and to like this video, subscribe to the Riff-Amonas channel and click on the bell so you know when the next code club video drops next Thursday. Keep practicing and we'll see you next time for another episode of Code Club.