 Next we get into de-plier, which is where you get into the real sort of manipulation of the data. So there's not about how it's structured or how it's imported, it's about working with it. And there's a set of functions here. Select, arrange, group by, summarize, which is available for both Americans and British R users. Mutate and join. And in the next section we're going to work through select to mutate. There's a separate one on join because it's that important. So select is about isolating a subset of columns. Now this is a little bit different than extracting a column as a variable or a value because this maintains it in a separate table. And you can save it as a separate table. So it's not a column, it is a subset of columns. So the first table here has all the columns. And the second one only has some columns. And select is fairly straightforward. So again you have sort of a new name for your new thing and your assigning assignation in R. You have the set that you're working with to begin with your original, you know, all your columns set, pipe, select. So then you just list the columns that you want in your new one. So you just want name, gender and universe. That's pretty straightforward. Another option is if they're all in order, you can use the range option. So if I wanted gender, film and universe, I could have put gender colon universe got that range. And you can also subtract one by using minus and then naming the column that you want to chuck. So these are a couple of different ways that you can get subsets of columns in a new table. Fairly straightforward, because obviously, you know, maybe our data set comes with, you know, a lot of years of data, but we're really only concentrating on a few years. So you can just grab the years that you want, save yourself some headache. Filter is the exact same, but for rows instead of columns. So you can filter by one sort of category, one one criteria. In this case, gender double equals female, put female in quotes, because you want it to look at the contents of a string into match exactly. You can also isolate a subset of rows according to more than one criterion. So here, gender equals female and has film equals yes. So yeah, isolate is is great. We can isolate rows according to criteria. Again, this is fairly straightforward. And I haven't showed you that you can save these as a new variable, you know, a new output, but that's the same as you've seen before. So I won't labor that point too much. Arrange reorders the contents alphabetically according to the the order that you name the columns that you want them arranged by. This is just like in Excel where you sort according to column one and column five or whatever. You can do a reverse, sorry, you can do a reverse arrange if you want them to go from Z to A, or from high to low, if it's a numerical column or something like that, you can arrange by one column by two columns by, you know, however many columns you want, depending on what kind of data you have and how you want it ordered. Arrange doesn't really alter the data, it just alters the presentation of it. So you may or may not want to save this as a separate variable, because you can always just rearrange things differently if you want. Now we get into something a little bit more interesting here. Summarize turns a vector into a value. Summarize, for example, if I wanted to find the mean start date, you know, the mean year in which these comics started, you know, when they were first published, I would summarize by mean the column start date that I want to summarize and find the mean. And quite often you will find it helps to have na.rm equals true. This essentially says ignore anything that has a missing value, because R will just throw a hissy fit if any of the values are missing and that is unhelpful at the best of times. Other summarize things that you can use are, you know, there's, there's, you can find the median as well, you can find interquartile range or standard deviation, you can find the min, max, quantile, you know, all of these basic sort of statistical analyses for, for a vector, you can, you can find all of these, you can find out how many there are and how many distinct ones there are, you know, it depends on what you're trying to do with this data, but they are all the work more or less the same as this, but you would change mean for, you know, and just underscore distinct if you want to find out how many different years were in this list. And very helpfully, summarize can be added to group by so that you can find the mean of groups. So for example, if we wanted to group by gender, we pass, you know, say new variable name, assign it, starting with this data set, pipe it to group by gender, pipe that to summarize mean start date, remove missing values. And that shows us that female superheroes started on average, what about 25 years, now 15 years after male superheroes. So there could be something there or it could just be coincidence. All right. Now mutate turns vectors into a new vector. So essentially we would get, this is a way to add columns derived from existing columns. So we would filter by has film. So we're only looking at ones that have films. We would create a new column called film delay, which takes the film date, subtracts the start date. So essentially, let me try that again. Essentially, we want to create a new column that tells you the gap between a character being first published in a comic book, and then getting their first film. And we do that with a combination of filter and mutate. And in this case, only these three have films. So they're the only ones here. Start date, film date, film delay. So Ant-Man had the longest delay. Hellboy had the least delay. And Electra is there in the middle. So I expect there will be a few questions because that covered quite a lot of material.