 Hey folks, I'm Pat Schloss and this is Code Club. We're in the midst of a series of videos where I'm trying to compare what people said they intended to do in terms of receiving the COVID vaccine and what they actually did in terms of receiving the COVID vaccine. Yeah, we have data from 2020 that was generated by Ipsos where they asked people in 15 different countries whether or not they'd received the COVID vaccine when it became available. Here we are a year later. We've now been keeping track of those 15 countries and many others to see what their vaccination rates look like. Over these recent episodes, I've been working on bringing those data in to R in RStudio so that we can ultimately join it together with our data from Ipsos along the way. I'm taking this as an opportunity to do a little bit of a deeper dive on different components of the functions from the D-Plyer package. In the last two episodes, we've talked about the select and filter functions. After I said goodbye to you all in the last episode, I went ahead and built out the rest of my filter function to put in the names of those 15 different countries. Looking at my code here in RStudio, you can see it's a lot, right? And so this is not an ideal way to specify what countries we want. If we were to say look at 30 countries, well, I would now need to list those additional 15 countries, and that's going to get kind of painful. In a few episodes, we'll see how we can get away from having this very long set of OR statements to find these 15 different countries. One of the things I noticed is that my output only has eight rows instead of 15 rows for the 15 countries that I've, you know, so carefully listed out here. And what I think is happening is that this date line in the filter statement is resulting in a bunch of or seven countries having NA values. So what I want to do with you is let's go ahead and drop this NA drop drop NA and see what we get and whether or not we have NA values for those seven other countries. And so sure enough, looking at this, I now see yes, we have NA values for those seven other countries for 1028. So what I'm going to do in this episode and the next episode is show you how we can avoid those NA values. And so we're getting the most recent data for these 15 different countries. They might not all be from October 28. Hopefully they're close to October 28. But I'll show you a way that we can do that. Before we can fully jump into this though, what we need to do is we need to learn how we can arrange data in a data frame. I'm going to go ahead and return the drop NA. And again, if we look at OID, we see that we have these eight rows for eight different countries, where we had data, although this isn't the full set of 15 countries, we can work with this to demonstrate a variety of things about how we can arrange the rows in a data frame. Okay. So again, if we take OID, and we pipe that to a function arrange, we can then give the arrange function a value, right? And so let's go ahead and do by fully vax. So if we'd say arrange fully vax, then our data frame here will be sorted by the fully vax column. And sure enough, we now see that India is at the top with 22.7% of people being fully vaccinated in Canada down at 74%. What if we say we wanted it flipped so that we'd have Canada at the top and India at the bottom, we could then do OID, we could again pipe that to arrange. And instead of simply putting fully vax, we could say DESC fully vax. And what that says is descending order of fully vax. And so now we have Canada at the top and India at the bottom. And so that's really nice, right? Because now we can order our data frame, so we can more easily see what country is at the top, right? So Canada is in the tops, whereas India is at the bottom with 22.7%. Again, the arrange function allows us to sort a data frame by certain values in the data frame, we could have just as easily put in here all vax to order it by the all vax column, we can also put in here the location, right? So we could do OID, pipe that to arrange on location. And I think this is the default input. So it was an alphabetical order. But again, we could do reverse alphabetical. How? Yeah, we could do DESC on location, so that United States at the top and Australia is at the bottom. Again, arrange allows us to change the order of the rows in our data frame. Something we've seen in previous episodes when we had these big honkin data frames, and I wanted to show you the end was the tail function. So we could do OID and pipe that to tail. And this gives us the last six rows. Well, there's also a function head, right? And so that gives us the first six rows. So within the plier, there's a special version of head and tail. So we could say slice underscore head. And this then returns Australia, the first row of our data frame, whereas slice tail returns the bottom row, the United States, right? And so if we wanted more than just one value, we could of course say n equals three, and we get the three values at the bottom of the data frame, right? So we can then combine these, right? So we could do arrange DESC, fully vax, and pipe that to slice tail. And so then what we get are the three countries with the lowest value of fully vaxed. Of course, we couldn't, we didn't need to do the DESC, we could do arrange on fully vax and then slice head to see those same three countries up at the top as these are the three countries with the lowest vaccination levels, right? And again, you could do tail to see the three countries with the highest vaccination levels, Japan, South Korea, and Canada. So we could do similar types of things using another function called top n. So we'll type pipe, oh, it to top n. And we will then do this on say fully vax. And let's do n equals three. So we want the three rows with the highest fully vax value. So this returns Canada, Japan, South Korea, you'll notice that these rows are not in order, right? So we could then, of course, feed this into an arrange statement. One thing to know is that top n is no longer being supported. Top n was what I learned when I was learning dplyr. An alternative to top n is slice max. So let's go ahead and try that. So we can do slice, max, fully vax. And then let's do n equals three. Now we get the rows for those three countries that have the largest fully max value. And what you'll notice is that the difference between top n and slice max is that slice max is now ordered by that fully vax column, which is kind of nice, right? One other thing, if you've, if you've looked at top n in other people's code, that if you want the bottom three, you could do minus three to then get the countries with the three lowest values of fully vax in the data frame, right? So that's a little bit confusing. An alternative to that with the slice functions would be slice min, where we could do oid, pipe that to slice min, fully vax, n equals three. And there again, we now get India, United States, Australia. And again, it is sorted in increasing level of fully vax. Again, these are really convenient functions. I present the top n and the head and tail, if only so that when you go out there and you're in the wild looking at other people's code, you know what's going on. But I think that the slice max slice min are really nice as a way to easily get the three rows or however many rows you want on some column that you then want the data to be sorted on, right? And so you slice max to get the three largest values or whatever you want n to be and slice min to get the smallest values. The other thing that I use frequently, I do often use tail rather than slice tail. I'm usually not looking to get a large number of rows out when I'm looking at a data frame. So we could do the same thing that we have here with slice tail using vanilla tail, where if you remove the slice, we look at that. And then if we look at it with slice tail, yeah, the output is the same for these two data frames. I think if you're in the mindset of using slice underscore something like slice min slice max, there's also a slice sample that allows you to randomly pick rows. Let's see what that does. We'll take oid and pipe that to slice sample. And let's do n equals three. And this gives us three random rows out of the data frame. This is helpful in simulations where you're trying to randomly grab rows out of the data frame. But for this example, not super useful. So to come back to our original question that we started this episode with, was the problem of the NA values for October 28 for seven of our 15 countries. So we might say, Well, what is the latest date that we have data for say Brazil, which was one of the countries that didn't have October 28 data. Well, I'm going to go ahead and insert a line here and I'll do filter location equals equals Brazil. And then if I look at OID, I now know that this is only Brazil data. And obviously this goes back to February of 2020. And I could then say, Well, let's take OID. And we could look at the end rate. So we could say, let's do slice tail and equals 10. We get the last 10 rows of the Brazil data. And we see, Oh, there is data for October 27. And so perhaps something we could do would we could begin to think about doing something like OID, piping that to drop NA on fully Vax. Right. So that then gets rid of all those. And then we could then say, let's do slice max on the date column. This then returns the row of data for Brazil on the last day that we had data. Right. And so now we can begin to think about going back for those seven other countries and getting the most recent date that we have fully Vax data for right. But in the next episode, which is why you've got to be sure you've subscribed and click the bell icon so you know when to come back. We're going to see how we can easily do this type of operation that we have here for all 15 of our countries. So that regardless of when we are querying the database up at the our world and data website, we are going to get the most recent data for each country where we have all Vax and fully Vax data. So keep practicing with this. I realize this is a little bit brief, but there's a lot of great functions in here. Be sure to play with those slice functions. Give the arrange function a shot and see what you like better the slice head or the vanilla head or slice tail or just the plain old tail. And we'll see you next time for another episode of Code Club.