 Hi, I'm John Little and you're watching the Introduction to R Learning Resources series. This series is part of the RFUN Learning Resources website sponsored by the Center for Data and Visualization Sciences at Duke University Libraries. In this section, we'll talk about how to join or merge data together. Typically what happens in a case like this is people have two different data frames and they want to join them together. So in this example, we'll join the Star Wars data frame that we get from DePlyer with another data set that we imported from the 538.com website. Let's have a look. The code is in my Intro to R website and I'm going to open the file 02 join skim eda. Let's make sure that we load the proper libraries, although we only need tidyverse for this particular instructional video. This will simply tell us that one of the libraries that we've loaded is DePlyer. And DePlyer is the library that has the join functions. So first let's read in the Favorability Popularity Rating data, which I transformed from data found at the 538 website. What it is is a survey that 538 did asking people how much they like certain characters in the Star Wars series. So let's have a look at that. And here you see a series of characters, they're 14 actually, with the highest number being the most favorable and the lowest number being the least favorable. Let's also load the Star Wars data set. I'll do this formally so that you can see that we have two data sets available. When I execute that code, you can see over in my environment variable that I have two data frames available and this is an 87 row data set. Again on board comes with DePlyer consisting of Star Wars characters, 87 Star Wars characters with various characteristics like height and mass, hair color, etc. Let me go ahead and make the screen full size so it's a little easier to see what we're doing. So what I want to do is I want to join Star Wars with Favorability Rating and the common key that I can use to join that data because what I want to do is I want to bring this Favorability Rating into a larger data frame. The key to do that join is name. I have name here and name there. Now you could join, you could use different variable names for a join key and you can check the documentation for that. It's easiest if they have the same name. It's also best practice to join on a numeric key because numeric keys are less ambiguous. Natural language is more ambiguous. Proper names in particular, even more ambiguous. So there are tactics that you can use to make sure a join works really well but let's just go ahead and proceed and talk about some of those. I'm going to scroll down here to line 80 because the first thing we're going to do is we're going to do a left join. That's the kind of join that most people ask me about. That's where you have data from one data frame. Let's call it the left-hand table and you want to join it to another data frame. Let's call it the right-hand table and what will happen is you'll match on that key in our case name and if there's a match the data from the right-hand table will get merged over to the left-hand table. If there's not a match, the un-matching data in the left-hand table remains the same. That's what most people seem to want to do. That's what we're doing here. We're doing that join on the name key. We're selecting a certain order of columns, signing it to a variable, and then we're going to arrange that to look at it. So here we have, I have arranged the favorability rating to be the second column next to the name. And what you can see is that although we had 14 characters in the join favorability, in the un-joined favorability table, we didn't get all those 14 characters. I know because if I scroll through this, here's begins, we only got eight. If I scroll through this and it starts displaying here, everything else is NA. So that's a left join. It doesn't affect the left-hand table if there's no match. Now, why didn't these things match? Well, as it turns out, computers are very literal. And so the computer wants to, in this case, match on the character C3PO. We typically say in English, C3PO. But this is, and that's true. That's what the character's name is, C3PO. In the Star Wars data frame, the character's actual name is C-3PO. But in the favorability data set, the character's name is listed as C-3P0. And so there's no match because to a computer, those two things are completely different. And so since there's no match, there's no favorability rating data that's brought over. So let's talk a little bit about some of the other kinds of joins you can do. I'm going to skip down here to where I start to talk about joins. I have a nice URL right there that takes you to the documentation at the plier, which fully explains what the different kinds of joins do. We just did a left join. Just for convenience, the next join I want to do is an anti-join. Anti-joins are explained right here. And basically tells you that what happens is, if there's not a match, you don't retain the final table. You don't retain the non-match data in the final table. So that's a really easy way to keep only the data that does match, right, is with an anti-join. So again, we're joining on name, favorability rating. So in this case, I'm actually going to start with what was formerly my right-hand table. And I'm going to do an anti-join because the right-hand table only has 14 names in it, the formerly right-hand table. The other table has 87 names on it. So I don't really want to see the names in the 87 row table that don't match. I know I only have eight matches, which means there are 79 non-matches. I don't want to comb through the 79 non-matches. I want to use a smaller set. So I'm going to start with my smaller table and see what didn't match in the larger table. And to do that, I do an anti-join. And I come back with these six names that presumably didn't match in the larger table, although I know all of these characters are in the larger table. And they're all spelling issues. So some of the things, some of the strategies I could do is I could remove spaces in the match names. I could reduce everything to lower cases. I could remove any special characters. I could remove any diacritical remarks. For example, Padme might be spelled with a diacritical E or an accented E. But here I now have a nice list of those characters that didn't match in Star Wars. So I have an easier, so I have something to go on here. I have some strategy. Let's talk about one other join, the inner join. And what the inner join does, different from the left join. The left join, remember, keeps everything the same in the left table. The inner join keeps everything the same in the left table as long as there's a match. So I'm going to start with my big left table, do an inner join, just like I did before with left join, on the key name. And when I run that, I get only those eight rows that were matching before, right? Let's scroll back up. Remember, the matches in this case, because of the way I sorted it, there were only eight. And then I would have to do something to drop those other rows. If I knew I needed to drop them. Or I could just do an inner join, and that does the same thing.