 Well, hello, everybody. It's nice to see you today. Welcome to our live stream. I'm Monica Wahee. Let me make sure that my video is on. It looks like it is. Looks like my audio is on. Hopefully you can just talk to me in the chat if you have any questions. Today we're going to go over great R packages for health data analytics. This is actually part of a four live stream series that I have on four different R plots. There are plots you can do in R, but I'm also a SAS user. So if you're a SAS user too, don't worry. I explain how to do it in R. You basically need the summary data to be able to do these plots. So the first one of this quartet I'm doing, it's a quartet, I guess. The Leichert scale plot, I presented that at our last live stream. So if you go to the company page on LinkedIn, you'll see where I posted our recording of that live stream. So you can catch them. Then today we're doing the Dumbled plot. So see this image. This is one we're doing today. Next live stream is going to be the upset plot, and the one after that is going to be the screen plot when I talk about factor analysis. So in R, you have base R, which R is open source, so you just download and install it. And you can add packages to R to enhance it, which is kind of like if you're a SAS user, it's like adding components. So if you add packages to R, you can do these specialized plots. And so today I'm going to focus on this one plot, the Dumbled plot. Welcome, Roshan. I'm glad you're here. So I was just saying that my live streams, I'm having four live streams on these plots. The Leichert plot live stream already happened. You can see the recording. And today is this Dumbled plot live stream. And these plots that I do have also made blog posts about them. And in order to get, we're going to look at the blog post for this Dumbled plot today. But in order to get those, you should really download the slides. But for now, I'm going to stop with the slides and I'm going to go over to the blog post, which is linked to on the slides. This blog post is Dumbled plot for comparison of rated items, which is rated more highly. Harvard or the University of Minnesota. And so the reason why I have compared what's sort of different about what I'm presenting today about this Dumbled plot is where I first saw, first of all I'd never seen this Dumbled plot, but where I first saw it was comparing literally they were using almost the same colors. It was blue and red. And they were comparing Democrats to Republicans on their opinion. So it was like 100% either way or something like that. And it was like Democrats, which were blue were like 40% and Republicans were like 80% and they are colored red because that's their color. And welcome, welcome, Elias and George. I'm glad you're here. And so make sure that you download the slides in the chat there. So that's the way Dumbled plots are usually used is an aggregate estimates, right? But what I used it for was comparing literally different aspects of the same thing. So here's the University of Minnesota, Twin Cities is where I got my degrees. And this is Harvard University, where I lived by Harvard now. And they had, you know, Harvard has this great reputation. But you see how there's all these aspects that you could compare about each of them. So what the Dumbled plot is good for, and this is probably why it's not that popular, is it's good for comparing exactly two things. Two things on multiple points of like multiple, I guess, domains multiple aspects. And so if you are like recently I was on a board and the board we did a request for proposals and people submitted proposals. Now there weren't that many proposals so we didn't narrow them down to two but sometimes that's what happens is we narrow them down to two and we're rating them on different points. So we could have used a Dumbled plot then to compare the last two to see them on all those different features. Because when you look at these numbers, and by the way this is from rate my professors, and this is not my favorite place because you're a professor people rating them. But let's say I do University of Minnesota in here and I searched this. See these images because I took this from this website. I actually went to the Twin Cities, my brother went to UM University of Minnesota Duluth. So let's look at Duluth, right? Like that's a small town. Sorry for all these ads here. So see safety, location, happiness, reputation. Actually Duluth is kind of a nice city. I have to admit the food is not that good there. So let's say you were comparing University of Minnesota Duluth to some other college that had an overall quality rate of 3.7 but you notice their food was really good. Well then how do you compare the two fairly? It's hard to look at that. So that's what I did over here in the Dumbled plot and now I'm going to explain to you how I made the Dumbled plot. Actually let me just show it to you first to remind you what it looks like. See I put all of the different features on this y-axis and then for each feature like this is social, this is safety. The reason why this is just on top of each other is they got the same score, right? And then as you can see here's Harvard and here's U of M and you might, those of you really know what you're doing. The University of Minnesota color is maroon and Harvard has the same color. So I just chose blue for the University of Minnesota because Minnesota is really cold and then I could remember it was blue. But anyway yeah technically they used the same score but anyway. So here remember the food well I would show you Duluth. The University of Minnesota I think the food's pretty good over there. It didn't get that good of a score. In fact the Harvard food I would argue with this. I think the University of Minnesota is good. But anyway so here we have like the University of Minnesota better. Like it's better with clubs right and it's better with it's better with like social. So this is how you could quickly see like already we can see that the blues are on the right and this is the highest score. So if you're comparing these two you may want to go to the University of Minnesota. So anyway I'm going to show you how to make a plot like this. But before you make the plot you have to have these scores which is why I went here and I went and got some scores so I could demonstrate this and just be fair about it. Alright. So here where is my here's my R. So I'm going to go through this R code and you can download the R code from that web that blog post. So I first I set the working directory and I was I'm on my laptop today so I set it here. If I run it it just sets the directory where it's going to like put things or import things from. Alright. But the first thing I'm going to do is make a data set and so you're probably going to realize that if you're in SAS it's hard to make a data set. But in R it's really easy and I'll show you. See this it says order and then I put C 12 1110 whatever this is creating a vector called order here run it and then you can see it. Okay. So it's called order and it's got these values. This one is creating a vector called car and it has these values as you might guess this means characteristic right. So those are those different characteristics. So now I have this vector here. So these are the U of M scores. I've got this vector the Harvard scores and so I make these two vectors and so basically I'm making four columns right and I'm now going to use the as data frame C bind order car U of M scores Harvard scores to make a data frame called Web page. So let's do that. Tada. We have this Web page over here or I mean this data set over here. So this is our data set look like remember I made these vectors and then I just put them together. But there was something I didn't think about before I did it and that is that if you make a data set that way to use numbers the numbers often aren't seen as numeric right. So if you see this class here I'm doing the class of the U of M scores and it's character. So I realize I had to convert those scores to numeric so whenever I modify columns I just create a new column with a new value right. So I created this new column order underscore and U of M scores underscore and Harvard scores underscore and it's just the numeric version of those. Okay. So now it looks like there's going to be duplicates like if I do Web page we look at this see there's extra scores but see underscore I know that that and you can kind of tell because see when it's a character just as for but here it says you can kind of tell that. All right. And so then hi Ron B. J. I'm glad you're here. All right. So now we've set up our data set. So our data set looks like this. So we're going to be able to use this order column to sort these in order. So of course you can put this in any order you sorted in and you can do a vectoristic. We're probably going to display this on the slide. We're going to be able to use well actually we wouldn't use this we use this order and to sort it and these and we can plot these. Okay now we're ready now the next thing we're going to do is we're going to create colors. So this is a vector of a color and this is a hex color. So remember how the University of Minnesota I had to choose blue because it's cold and Minnesota you know Maroon was taken by Harvard. Well this is a hexadecimal color see this octothorpe or pound sign I like to say octothorpe if you go to a color picker over here I just said color picker here and you put this here that's this that's literally this color and if you go back let's see here and then this one's literally going to be the Maroon color. So I'm just saving them in these variables so I remember them like U of M call underscore call I'll remember which color it is so I can call it up in the plot alright. Now before our live stream today I actually updated my version of R base R and I had to reinstall these packages. This uses ggplot2 ggalt that's where you're going to get the dumbbells from and also the tidyverse I'm not sure what I use but we have to run all this so let me make sure it's run. I keep talking about it okay now we're finally ready for the plot now this is a very long call for the plot this is something I hadn't done before where you just put ggplot and then both parenthesis plus and then you start and then you start just putting the other ggplot code the way ggplot2 syntax is is you declare an object that you're going to put on the plot and then you create all these attributes of that object so we're putting these segments on you know those lines across and we're also putting dumbbells on okay and then down here I made a craft of legend because it was just too hard to make. So first let me run the whole thing and then I'll go through this code one at a time so let's run the whole thing to the screen just so you can see it run. Okay so there it is right so you see these are the labels here that we mean. Remember the score is like 4.3 and all that here are the scores and here's the Harvard score and this is a handmade legend so just to maybe jump ahead to the legend see this here where it says georect, that's a rectangle and the AES option or whatever the argument I guess is the right word in ggplot language ggplot language is kind of like a sub-language of R and when you use it first of all I was telling you about that syntax is that you keep adding pluses then you keep adding objects so literally this legend is added at the end you know after this is the plot is constructed in fact I can even let me just run it from here and show you. This will put it there without the legend and you'll notice this gray background because the last thing I call it as a theme see where I put plus theme classic and that's what adds that sort of classic look to it so that's the last thing you can do on that but this calls a rectangle and you can see by the AES argument that I'm saying the x minimum is 2 and 3 and you can see it on here or actually I'll run the whole thing you can see it that's why ggplot is nice you can kind of see what you're doing see 2 is down here and 3 is over here so I made this rectangle here so I made this rectangle then I annotated this point I made this u of m colored point here and I put the text next to it this u of m score this backwards slash and just puts an enter there and I put another point here's my other point and I'm saying the x and y coordinates of the point and this is the annotate command in ggplot so as you can just imagine you could put anything on the spot you want anywhere you want you can put p values on it whatever you want but for the main graphing thing I needed to just figure out how to do these dumbbells so that's gg alt and that's this one here so as you can see if you actually this is kind of weird if you just run the segment part it just sets up the plot and there's nothing on it you actually have to add the dumbbell plot here the dumbbell and explain it and what's kind of going on here is you have y and y and for the end and you have x and x and you know so this is y and y and is figuring out how long this is going to be in x and x and this figure out how wide this is going to be and you have this again over here when you're plotting x is the u of m scores and the x and is the Harvard see that so the x and the x and this I guess it goes well it goes whatever direction you want it to go I mean it's when Harvard's more like here it'll be and then so that's how you do this and then you run the whole thing it looks well I don't know the whole thing but you run this whole thing it looks really nice and classic because I put that on there and then I encourage you to go to my actual blog post because they're like gg save I used in fact let me put the links link to get to today's slide so you can go to that blog post because you also want to learn about gg save gg save automatically exports this plot out see it's called double plot png units inches with 8 height 5.5 this is literally inches dpi 300 and I can go over there I can get like let me see that where I map the directory here it is and I can just open it up see that and see the shape it followed my rules about exporting as a png and these inches or whatever so I strongly encourage you to look into gg save that's the main reason why I put that link there so I can do that alright so now I've shown you this double plot and I've also shown you sort of how to interpret it you know like just looking at it here if you are somebody who really cares about location happiness clubs opportunity overall quality internet sure looks like but if you don't really care about that or you care more about food apparently um it says here the overall quality is a little higher the reputation is much better at the well it says they're right on top of each other for you this I was actually surprised by this data but as you can see it's really hard I started this project actually because I was trying to compare hotels I don't know if you saw one of my posts on um LinkedIn is I was trying to compare two hotels that were right next to each other um and I had to pick one and I was like which one and they were all so different and I said you know this is a case for the double one thing you saw me do in this demonstration is go to that rate my professor um online place and I did that query and stuff and you know there were ads everywhere and but I got some data out of it in fact I was using some data to demonstrate what we were doing today and I was trained in research so a lot of you listening today you're probably trained in research too and if you're trained in research you're used to using data that has been prospectively gathered according to a protocol it's well documented but the problem is when you do something like what I did today where you're using like rate my um you're expected to analyze data from an application basically and it gets really messy like you even saw there were two sort of overview scores for that dumbbell part so what are you supposed to do so I developed this workshop called application basics and this April month our theme is crossing domains so the learning objective of the workshop is to understand data sets from application well enough to analyze them and produce results and so um it's if you come to the workshop you'll learn about computer applications you'll learn the design approaches team structure and how they these people work together to develop storage areas for the data and how those data are stored you'll learn terminology use and application development and with this knowledge you can break through communication barriers to get the answers you need to complete your analysis and be seen as an expert when working with data from applications like that from that rate my professor so here are the details of the workshop it's called application basics crossing domains and it's Saturday and Sunday April 27th and 28th 2024 so at the end of the month each session starts at 12 p.m. eastern time because I'm in Boston and runs about three hours it's on zoom and on a weekend two session interactive workshop like this in data science I priced it out to be about 250 to $750 per workshop but because you showed up today your price is free I'm happy that you showed up thank you very much and I hope you have a very good Tuesday and I hope you have a very good rest of the week thank you for watching this video which is part of the public health to data science rebrand program if you are interested in joining the program please sign up for a 30 minute zoom interview using the link in the description