 I'm John Little, and you're watching the Introduction to R Instruction series. This series is part of the R Fund Learning Resource website, which is sponsored by the Center for Data and Visualization Sciences at Duke University Libraries. In this section, we'll talk about exploratory data analysis, that is, EDA. That means briefly we'll discuss the Schimmer Library. So let's go to the code, the file, O2 Join Schim EDA. I'll expand my screen, and you'll notice when we load packages, we're loading the tidyverse followed by the Schimmer Library, is a package created by R OpenSci. Now to prepare our data, we're going to use the onboard data set Star Wars. And then we're going to read in some data from the 538 website, 538.com. They did a survey of favorable Star Wars characters, and they asked people to rate their characters how much they like them, how much they disliked them. So I downloaded that data and transformed it a little bit. We'll read that in. I'm going to skip the first 11 rows. That's information about where you can get the data from. Let's have a quick look at the Star Wars data set, 14 columns, 87 rows, and the favorability popularity rating that we just loaded in. In this case, it's just a number, how many people voted that they thought very favorable, that they thought favorably of these characters. All right, notice that we have a name here and a name there, and that's what we'll use as our join, the same name in the two different data frames. All right, before we do a join, let's use the skim function of skimmer. If we use the skim function on the Star Wars data frame, it returns for us some information that we already know and could have gotten in other places, but it's handy here. 14 columns, 87 rows. Data frame is called Star Wars. We can see that three of those columns are list character type, three are numeric, and the remaining eight are character or string data. They then give us information about each of those types broken down, taking the eight string or character data types. It gives us each of the column names, tells us the number of rows that are missing for that particular variable, but there are five characters missing hair color, and then it gives us some information like the minimum number, which means in this case, the number of letters in the hair color value is the shortest, is a four, and the longest is 13, and how many are empty, how many are unique. For list data type, it tells us information about the list. We're not particularly interested in that, so we'll move on. What I really like about the Skimmer package is that it does the same thing for numeric, and at the very end, gives us a nice little spark graph so that we can see the distribution of values or height in this case, which looks relatively normal for mass, which looks right skewed, and for birth year, which looks right skewed. Now we can do the same thing for the favorability rating data frame, and here we see a u-distribution, which tells us the favorability ratings of the characters, and tells us something that we probably already know, they're movie characters, and by and large, you either love them or you hate them. So let's move back to joining. Let's start with our left join. We're starting with our Star Wars dataset, and then using the left join function to join to the favorability popularity rating data frame, and we're joining by the key called name. Below that, we'll just take a look at that, put a range and descending order, and we can see that we matched on Han Solo, Yoda, all the way through to Jar Jar Bix. Okay, so we've done our join. Let's now use the skim function again on the join SW underscore join, but first we'll drop anything that doesn't have a favorability rating, and there we see that the table is larger by one variable. It's now 15 variables instead of 14, and we've managed to join our u-shaped favorability rating along with our height, our mass, and our birth year. So that's a brief introduction to explore our data analysis, skimmer.