 It's LinkedIn Learning author Monica Wahee with today's data science makeover. Watch while Monica Wahee demonstrates making a dumbbell plot in R. Hi everyone! In today's data science makeover we are going to make a dumbbell plot and we are going to do it for a specific purpose and that is comparing ranks on various dimensions between two experimental units. I was writing this book chapter with my forever intern Natasha when I wanted to compare two items on multiple dimensions. I really didn't know how to show it and that's when I found this dumbbell plot. So to give you an easy to grasp example I went to ratemyprofessors.com and I found that they had rated colleges on several dimensions. I took the ratings for Harvard and the ratings for my alma mater the University of Minnesota and put them here on my blog post about this topic. See these different dimensions the schools are rated on, location, reputation, internet, food, but which school is better. In other words how do you visualize this in a way to make it easier for you to make a decision about which one to choose when you have so many dimensions. That's what I'm going to show you how to do with this dumbbell plot in R. So before we make a plot out of these values it's best if we make a data frame. It's so easy to assemble data frames in R and you can do it many different ways. It's kind of like building with Legos. When I have data like we do today where the entity and attributes are obvious, two colleges and then all these attributes, I like to make a bunch of vectors, one to represent each column in my data frame and then I use the C bind command to column bind them together. I'll show you. See these vectors I'm making? The first is called order and just has a sequence in it. You will see it is ordered from 12 down to 1. I use that column to order the dumbbells in my plot. So I admit this was added later after I fussed with the plot, but then I can show you how I used it in my code. The second vector is called car for characteristic and has a characteristic label, quality, reputation, etc. Only there were mysteriously two quality metrics for some reason. There was a quality metric on the landing page for the college and then a different one with a different value that you could find actually reading the different attributes on the page. Weird, huh? They must be calculating each one differently so I just solved it by including both. Then the next two vectors are U of M scores for University of Minnesota scores and HVD scores for Harvard scores. The important thing about my trick to making a data frame is that all the vectors have to be the same length. Okay, here is the line where I use the as data frame command with C bind to bind order, car, U of M scores and HVD scores into one data frame called web page scores. Let's highlight and run this code to make our data frame. Okay, but one problem with my way of making a data frame is that I don't take the time to tell our what class I want each of the variables. We are going to be using a dumbbell plot of U of M scores and HVD scores, so they better be numeric, but are they? Probably not the way I made my data frame. Let's run this class statement on the U of M scores to C. C character. So let's convert these to numeric. This is where recovered SAS users like myself rejoice. It is so, so, so easy to convert a variable from character to numeric in R. See this? I just decided to convert all my numeric variables into variables with the same name just followed by underscore N. And all I have to do on the right side of the equation is use the as numeric command. It works like a dream. Now, if I had a real character in there like the letter A or more appropriately the letter R, it wouldn't be so happy about the conversion. But if it can convert everything, it does so without complaining. Let's run this conversion code. Okay, now we have to designate our colors, which was a problem because the U of M has golden maroon. And since gold doesn't show up very well, I wanted to use maroon. But Harvard uses so-called crimson, which looks like maroon. So I ended up making the U of M blue, so you'd remember it since Minnesota is cold and blue. Cold air, warm hearts. Here's another thing for you recovering SAS users. The color library in R is as big as the hexadecimal color library. Some of you R users might be used to seeing actual color names in here, like pink and gold as I demonstrated in my box plot video. But you can use hexadecimal colors. If you take these color codes out on the internet to a hexadecimal color picker, it will decode them to some sort of blue for the U of M and I guess crimson for Harvard. And see how I just set a variable to the color name? I called the variables U of M N underscore call and HVD underscore call. That's a good thing to do, because then you just call the variable in your plot. And if you don't like the colors, you can fuss around with the colors up here where you set the variables to get what you want. All right, let's make these variables. Okay, now we are getting ready to make the actual plot. Here are the libraries you need, ggplot2, ggalt, and tidyverse. ggalt was new to me. So I want to give a shout out to Conor Rothschild's blog post on how to create dumbbell plots for a different purpose, visualizing group differences that helped me develop my use case. Please visit his blog post. I'll put a link to it in the description. Let's run these libraries to load them. Okay, the first thing I have to admit to you is that I clued the legend. I made it by hand, and that's not the first time I have done this with ggplot2 code, because that's basically what we are doing here, ggplot2. I'm a practical data scientist. If I fight with the plot long enough, I get tired of fighting and just clued it. It's a plot. It's a display. It's not like I'm making fake data. So that's my defense. But if anyone out there can get the legend to automatically show up the way I clued it, please put it in the comments. If I see any good ones, I'll make a follow-up video and show you an example of how to do it right. Is it a deal? But first, let me walk you through the ggplot2 code. We start by just declaring ggplot2 on the first line and not giving it any arguments. Then, as you can see by my nicely formatted code, we form basically two things, geome underscore segments and geome underscore dumbbells. Now, you probably know what the geome underscore dumbbells are. Those are the dumbbell shapes that appear across the finished plot. So what is this geome underscore segment? Well, that's actually the key to making this plot look good. Let me just run the ggplot and geome underscore segment code before the plus and show you what it's doing. See, we have a blank plot. On the y-axis, we have the different attributes of the school. Remember that order variable I made? That's how I got these to be in the same order as the original display I showed you on my block. You'll also see the Likert scale they use across the bottom. The labels don't look good, but there's still more work to be done on this plot. But that geome underscore segment is really necessary to lay down before you put the dumbbells down so that they look right. Let's go back to our code. See this geome underscore segment code? You'll see the main arguments are y, yend, meaning yend, and then also x and zend, meaning xend. I like to pronounce them yend and zend, but for you I will call them yend and xend. And you'll see I hard-coded x and xend, and I use that order variable, well, technically the converted order underscore n variable, to keep the y entries in order. Now let's make our dumbbells. Okay, and here you again see me using the order underscore n variable to reorder everything, but this time I'm working with x and xend only. And x is going to be the u of m scores numeric, and xend is going to be the Harvard scores numeric. You'll see I set the color of the dumbbell to black, but at the end of the code we have color underscore x set to the u of m color, and color underscore xend set to the Harvard color. You'll see I use the British spelling of color because I copied Connor's code and he used the British spelling. R seems to support many different Englishes, I've noticed. Okay, then the last thing we do in the upper part of this DG plot code is add a y label and an x label. Let's run this dumbbell plot without the legend code at the bottom and take a look at it. I'm remembering the reason I did not start the x-axis at one was because I had trouble seeing the difference in some of the dots, so I started it at two instead. Still, you will see by this plot that the schools are rated very similarly. Now, this comparison was for a reason. If you have residency in Minnesota, you can pay $10,000 per year for full-time tuition at the University of Minnesota. I don't know how much Harvard costs, but I've been told it's like at least $50,000 per year. And remember, I was trained at the University of Minnesota and a lot of disgraced people in politics are trained at Harvard. And look, almost all the blue Minnesota dots are to the right of the Harvard dots. Really, from the set of ratings from students, it looks like Minnesota is a better deal. And I pretty much agree with the ratings, although I disagree with this food rating. The Indian food here in Boston is terrible. I can't take my dad anywhere to eat. The Indian food in Minnesota, awesome. And the Bun Moc duck to die for. Try Bonaventamese over in Stadium Village. You'll love it. Okay, so I didn't solve my lack of legend problem in a conventional manner. Let's look at what I did. So here, this is the same GG plot 2 code with just a comment separating it here. You will see I just drew a geom underscore rect for a rectangle, and then I used annotate to write in all the points and texts I wanted. You can make fun of me all you want, but I really love these artistic tools that GG plot 2 gives you to annotate your plot. And I really hate fighting with some sort of automated engine, trying to feed it the right code just to get it to make a rectangle, some texts and some dots. So now I'm going to highlight all the GG plot 2 code and run it and you'll see my Cluj's legend appear. Voila! See how this cute little legend is up in the left corner? It looks nice. No problems. No worries. Cute little thing. Oh, and one more thing. Here's a bonus for you. I started doing GG save at the end of my GG plot code where I want to export the plot automatically into pre-specified dimensions. You will see here that I can designate the name, tell it I want to talk in inches, and then tell it the width, height and dpi. It's great for exporting plots of a certain format. Thank you for watching this data science makeover with LinkedIn Learning author Monica Wahee. Remember to check out Monica's data science courses on LinkedIn Learning. Click on the link in the description. Thank you for watching this video. If you liked it, please hit the like button. Also, I invite you to look around my channel and if you like what you see, subscribe. I hope you are enjoying your data science journey. Peace.