 In this video, we're going to continue on where we left off after merging our two state energy ranking data sets, the natural gas and the total energy. And we're going to solve this issue of these repeated rows for different states through logical statements and subsetting. So let's get to it. So the issue that we're having here is that we've got 18 states who have their full names here that don't have any data because they didn't produce natural gas. And they weren't in the total energy data set. And we've got their two letter abbreviations coming from the total energy data set because they have the total energy produced data. So what we want to do is get rid of basically all these rows that have the full names. You don't have any data attached to them. How do we do this? Well, we can do this using logical statements and filtering or subsetting our data frame. So let's make a new data frame, DF2. We'll start with DF. This is the output of the outer merge done above in the previous video. And we're going to make use of our PD not null function here. Because what we want to do is basically want to keep all the rows that either have natural gas data. So do not have NAN in the natural gas column, or do not have NAN in the total energy column, or do not have NAN in either column. In other words, we want to get rid of those rows that have NAN in both columns. So we will reference our natural gas column here, calling the data frame after the merge. So DF, square brackets, natural gas. And let's just see how this not null function works on the natural gas column alone. So let's go ahead and run this. We'll print DF2. We have our columns there, but we only have 33 states represented here. So these are our 33 states that produce natural gas from the natural gas data set. We want to also include those states that had some total energy production, even though they didn't have natural gas. So how do we do this? We say we want to keep those rows who have values in natural gas column, or those rows that have values or not null in the total energy column. So let's go ahead and run this. So we run this. Again, we sell the same columns, we scroll down, and we now have all 50 states plus DC. I know this goes to 68, but that's just because the index, the indices here have been broken. We've gotten rid of some of those rows in the middle that had some indices. If we want to reset that, we can just say DF2.ResetIndex, and we'll get our sequential numbering back. So we see now that these indices go down to 50. They start at zero, go down to 50, so there's 51. Again, 50 states plus DC. Furthermore, you'll note that we have our old indices as the second column here. We might want to get rid of that. We also might want to get rid of some other things here. For instance, you know, these rankings don't really need them. We're just interested in these quantities. So if we want to filter by columns, we can, we have some handy tools that are disposable here. So we can access values using in particular this look function for getting multiple columns or rows at the same time. So for example, if we want to select one or more than one column, let's make a new data frame, DF3 now, from our DF2.look, and we'll use colon to say all rows. So we're going to get all rows, and then separated by a comma here, we'll put in whatever columns we want. If we only want to keep, say for example, state and total energy and natural gas, we can enter them as a list. So we can close them in square brackets, separate these quoted values, text values and commas here, DF3. Let's see what we get. Here's our data frame, state, total energy, natural gas. Note that these columns are now in the order that I specified. So before we had natural gas coming before energy, and I've switched that around just by the order that I've stated their names in here. And we've effectively gotten rid of those rankings and that old index here. So now we have a much cleaner data set. If we want to select one or more, we can, for example, with this look function select one or more rows, conversely, we're not going to make use of this and later on so we'll just say for example cases, the DF3.look, we want to say for example, the first five rows, give you something like that. So we need to say what columns we want. If we want to say all columns, you could do that. The colon cases. And there we go, we have the first five rows. No, you can mix this up, you can do a combination of selecting different rows and different columns too if we wanted to, but I think you get the idea here. And then just last but not least, like to exemplify how to get, or easily display, I should say, the top energy producers here. So say for example, one of the top five total energy producers. We've got this handy function and largest we say, and largest five for the first five, and then whatever variable we want the largest values from. And so here we see that the largest total energy producers are in number one, Texas, number two, Pennsylvania, number three, Wyoming, Oklahoma and then West Virginia on down. So those are some handy tools to get started with processing data, changing data types. And in this video, sub setting the data into a clean data product. Thank you.