 Hi, I'm John Little and you're watching the Introduction to R Instruction series. This series is part of the R Fund Learning Resource website sponsored by the Center for Data and Visualization Sciences, part of the Duke University Libraries. Today, we're going to learn about the Deplier package. The Deplier package is part of the tidyverse set of packages or suite of packages. It helps you transform your data into a shape that is amenable to analysis for whatever your purpose is. We'll introduce five Deplier verbs and we'll talk about a few keyboard shortcuts and other tips. Now let's talk about the five verbs. Well, let's start with three. Filter, Arrange, and Select. Filter allows you to subset your data frame, your rectangular grid, by rows. So you're making your long number of rows shorter. Select allows you to subset your data grid by columns. So if you have a wide set of columns and you only want a few, you can select just a few. And then Arrange allows you to sort those rows by the variable values. We'll start with Filter. If I had a data grid of columns, three columns and four rows, and I just wanted two rows, I would have to filter that by some value. So if I have a column called iColor, and if some of the values in iColor are the English word orange, if there are only two of those rows, I can filter my four-row data grid to a two-row grid data grid. In other words, just taking the blue rows by the value of the variable iColor. First, I'm going to run my libraries. Deplier is actually redundant because Deplier is part of tidyverse, tidyverse is a mega package that loads eight packages. An onboard data set in Deplier is Star Wars. If I execute that, I'm going to expand this so it's easier to see. I can see that I have an 87-row data frame with 14 columns. Although I can't see all 14 columns and I can't see all these seven rows, I can scroll. And these are characters in the Star Wars film series. Now if I want to filter those, I can start with the Star Wars data frame and then, using my pipe variable, filter where mass is greater than or equal to zero. Because notice in mass, if I start scrolling through, some of the values are not numbers, they're NA. So this is one way to get rid of NA. It's not actually the best way. I'll show you three different ways. So Star Wars and then filter where mass is greater than or equal to zero. So if I run that, I now get 59 rows as opposed to earlier where I have 87 rows. Now if I want to preserve that subset of data frames, I have to use the assignment variable. So I'm going to bring back my code as it was originally just a moment ago. I'm going to use the assignment, and I'm going to assign the value of this expression to Star Wars small. And I'll run this one more time. And here I have my 59 row dataset subset by mass greater than or equal to zero. Okay, what if I wanted to sort this data frame? Well, I could do that. I could sort by height. I could sort by name, which I'll do that. And you'll notice that I'm using the descend function here because the default arrange is in ascending order. And if I put one of those variable names in the descend function, it'll be in reverse alphabetical order. Okay, so I'm starting with Star Wars small and then arranging. And when I run that, I'll see that my, the values in my height variable are listed in reverse numerical order starting with 234 kilograms. And down here at lines 9 and 10, I can see that these two characters have the same height, 198 centimeters. And that's where the subarranging comes in. I'm subarranging then by in reverse alphabetical order. So K now comes before D. That's another visual representation of a range. So if I had four different rows and I wanted to arrange where color name equals dark blue comes before light blue comes before red. All of this code here will work if you want to put that into either your script. All right, now I can also subset by columns. I can select my columns by variable name. I can select them by position, in this case 2 through 4, or a combination thereof. So name through mass, column 10, column 7, and column 4 through 6. So Star Wars small, taking that data frame that I had saved earlier. I'm going to go ahead and arrange it so that I have, in this case, in ascending order, species first, followed by height. Let's have a quick look at that. So species first, and Alina species, several other species, four droids in alphabetical order. And then I want to select just name species, height, mass, birth, and birth year, rather than all 14 columns. So if I run this whole code chunk, everything is displayed on the screen. I have a 5 by 59. Well, in any case, I have five variables in 59. Let's take a look at mutate. Mutate creates a new variable, and you can create that variable out of functions that you assigned to mutate. With this first example, I'm taking my Star Wars data grid and then mutating, so creating a new variable, where I create a new variable called big mass, which is equal to mass times 100. Here's mutate. I'm going to take my Star Wars small. I'm going to run some other functions that I had been running before. Let's see what we get there. Star Wars small, select five variables, arrange them in descending order by height, subarranged by mass, ascending order, and then subarranged by reverse alphabetical order by name. Then I'm going to filter where species equals human. You can see that right there. Notice the double equal sign equals equivalency. And then finally, mutate, where BMI gets value from, so that's another assignment variable, gets value from height divided by mass. And that formula is inside of the round function. And the round function allows me to limit to two decimal points, places, or whatever I choose. So if I run that whole thing, I now have BMI down to two decimal places. Next we have summarize and count. Summarize allows you to reduce multiple values into a summary. Count is really a special summarize function. I think if you learn count first, it's an easier way to understand the way summarize works. So for example, in this case, I have the Star Wars data set. I might want to count how many characters show up in each gender field. Star Wars and then count gender. And it tells me that 14 of my characters are feminine and 66 of my characters are masculine and four don't have a gender. All right, another example of counting. We're going to count the number of characters in each, for each mass, and we're going to have that sort. That's another argument in count. And we'll learn yet another way to drop NAs. We'll drop any character that doesn't have a mass listed. And when we run that, we see that we've got six characters that weigh 80 kilograms. And that is skewed right with a long tail. So a bunch of characters that show up only once, including this character right here, which is Java the Hut. Summarize works like that. Let's build on what we learned here. Let's get the total weight of all the characters, Star Wars, drop, summarize, I should say total mass equals sum. And so now we know that if you added all the characters that have a mass listed together, the total mass would be 5,741 kilograms, 0.4. Yet another way to filter out NAs, where we're saying we're using isNA function. And we're proceeding that with the exclamation point or a bang, which negates. So find Star Wars characters that do not have an NA in the height variable. Group by species. Group by often works with summarize. If we just run these first three lines, we're not going to see anything different. But then after we group by and even still visually they're not grouped, the grouping takes place after we use summarize. Because what we want is we want to know how many humans do we have and what's the mean height of humans or the count the species using the N function. We're going to get the mean height using the mean function, the min height using the min function, the max height, and we're going to summarize the height. Finally, we're just going to arrange all of that descending order on count. So we can see here we have 31 humans. The mean height of the humans is 176 centimeters. The minimum height is 150. The actual height 202. And the summary of height of all the humans, 5000.