 Well, hello everybody and welcome to today's live stream I'm Monica Wahee and I'm here today to start showing you some really cool plots and this is part of my Great R packages for health data analytics, and it's my our tricks for SAS users. So, oh, welcome Arikanth. I'm glad you're here Oh, make sure that you download the slides everybody download the slides because guess what I'm going to show you is I'm going to show you some links, right? so on the screen right now the Slides that I'm sharing you'll see there's four plots. Okay, not four plot four plots, right? There's this Likert plot this upset plot This dumbbell plot in the screen plot and these are unusual plots They're not the ones you normally see in biostatistics. I Want to make sure that you follow our company page and find out Because I'm going to be doing the live stream today. We're going to cover this plot here This Likert plot, but we're going to cover these other ones in the future, and I want to show them to you I'm I'm going to show you them and are So I'm a data scientist in the health sort of domain because I'm an epidemiologist And I use both SAS and are But what I'm going to show you you can do an are and you don't need to worry that SAS doesn't do these plots Because you can do the analysis you can get the numbers out of SAS And so I'll even show you that today with this like a plot now if you're an R user You're probably oh, it's good to see you Sunil if you're an R user, you're probably familiar with the package gg plot 2 which is a great graphing package and I call these non gg plot 2 plots But that's not technically correct. They're just not running the mill gg plot 2 plots They I think they leverage gg plot 2, but I'm just getting technical about it now if you download the slides You'll see that there is on this slide You have these links, okay? And today if I'm going to be going over what's on the second link this Likert plot It's a blog post that goes along with this and it's if you go that blog post You can download the code and the data I'm using today to demonstrate and then on the next slide You'll notice that it says something about factor analysis This pertains to the screen plot which is like I said follow our company page when I post that event You can come and you'll find out about this. All right. All right So here we are an are now If you go to that blog post on the Likert plot You'll see that I've sort of organized the blog post in like step one step two. I think I even brought up Here's here it is. So you'll see how it says, you know kind of down here like see this one two three stuff Okay, that's what I'm talking about in this code where it says step one step two But I'm actually going to run all this code and jump ahead to step nine Because I want to first show you what I'm talking about. I first want to show you what this plot is You know when you're like doing like a cooking Video and you want to see what it looks like after it's cooked first before you do the recipe. That's what we're doing here Okay, so it comes out like this, but let me just stretch it out So even if you're a SAS user and you don't really use our You'll see that this was kind of pretty easy I just ran all this code and this came out and I'm stretching it out to make it look nice Now if I wanted to save this I could go over here file save as I usually choose JPEG and Then over here I could just save it as a JPEG, right? But I first want to just Interpret this for you. Okay, so I'll show you the data set But what this data set had was it was a bunch of Likert scale Statements as you can see here. This was the response is one was strongly disagree Two was somewhat disagree Three was neither agree nor disagree or was somewhat agree and five was strongly agree and I actually took Some real answers from a survey. I just made up the statements. These are just like gibberish statements Okay So I want to show you and I just pulled out like five of them You know, I was helping somebody do a survey and there are a lot more more items But I just want to show you how to interpret this. Okay now these five items The reason they're in this order has to do with the results. Okay, so let's look at the results First I want to show you this x-axis. So see this zero in the middle Then it says percentage 50 a hundred and then here it says 50 a hundred This green side is the agree side So you can kind of see of the people on this side You know, how many get to agree? um and here you can see that if you take this Green and the light green together It says 51 percent over here. That's what that means And this is 36 percent for this one. So what it did was it first calculated that And then it sorted it in that order. So this first one here This is the 51 is the biggest and the 36 32 30 26 So the most agreed with one is at the top So you can imagine If you're doing like a psychometric instrument and then these are like the five items on one of your subscales, you'd be like I'd like to see if they're all sort of close or if they're kind of different Because let's look down here This one. So this is strongly disagree, right? And see this the strongly plus someone it's 60 percent and what's kind of nice about this is you can really tell like Of the disagrees most of them are strongly over here And of the agrees most of this are strongly over here Okay, but I haven't really gone over this middle one and this is the the disagree over here Like this is 40 percent. This is 36 percent. See this middle one This gray shows you how many people said They they didn't they didn't have an That neither green or disagree and some people call that neutral But I like to put it neither green or disagree because sometimes you just don't have an opinion But see how like 30 percent We're here That's kind of weird like people shouldn't be putting this a lot So they don't have a strong opinion. Sometimes that can be an issue. So this is nice here So you can see how If you're studying if you're doing a bunch of surveys I I don't recommend putting all of your liker answers into one plot I recommend taking groups and putting them in and then that way, um I'm sorry. I don't know what's going on with my phone. Um, I I recommend just putting groups of them in at a time So you can compare them like on domains And then that way you can make this visualization and it makes it easier for you to make decisions. All right So that's the visualization and now I'm going to show you how to make it And I'm just looking over here to see if anybody's got any questions Okay, so I'm I'm going to just clear This console here and I'm going to start at the beginning So this is r and you can set the working directory I set the working directory to whatever we're doing today. Um And then step one is I read in my survey data set So I'm going to read this in and I'm just going to show you what it looks like and it's called survey one So what survey one looks like is it's got a study id with these this number that starts with 14 755 And if I scroll down it's got 47 people in it And it's got five columns. They're called q1 q2 q3 q4 q5 and you just saw the legend That's the legend of what they were asked. They were asking statements and they had to rate them Now the problem I had with this data set and I've had this before is So imagine you ask a statement and it's a really awesome statement like everybody agrees with it That means that it doesn't like let's say five is a super awesome statement that everybody agrees with They're going to answer To q5 five or four they're Or even maybe three But nobody's going to answer one or two And if you get the situation where nobody answers one of the levels You've got to do a workaround for this plot. So that's actually built into this code. Okay And it's kind of a kluji workaround. So please don't laugh at me It works. Okay, and it helps you understand kind of how r works So imagine that you're in sass and you've got a ton of data or whatever You could theoretically just trim out these columns You know like even if you had like thousands like I think if you had a real million records if you just trimmed out these columns And you pulled it out you could just do exactly, you know read it into r and do this thing I just did Okay, so what's the first thing we're going to do is steps two and three Is where we design and make fake data and this is really fun and art because it's not that easy to do in sass So notice how study id is a column And q1 is a column or whatever. Well, what we're doing for our fake data is just making columns So I made a column called study id which is just a vector with a bunch of these nine Values in it why because then I can filter these out again. I know these are fake data and see q1 For the first one, I'm going to say every q1 is going to say one two three four five like 9999 one is going to say one one one one one for each And the next one is going to say two two two two two for each you can see what I'm doing, right? I'm making sure that each one That each one has uh That value in it just to game the system, all right And then I'm going to take these columns and sew them together splice them together by using a data frame command Into a data frame called fake, right? So let's just make fake All right, so let's run fake There's my fake data Looks just like the real data only I'm gaming it. So I'm making sure it's got the values in each Now I'm going to actually bind Our bind or row bind Fake to survey one to generate survey two this looks a lot like um merging in in sass, right? So we're going to just merge these together And here's survey two and see I've got the fake data at the end Now here's something that's not going to happen in sass and happens in r um right now if I Ask what class or what is the data type like class? If I go um, what is the survey two q one It says numeric we cannot do this plot with numeric data We have to use ordinal data and if you you know like one two three four five is ordinal And so you have to classify it as a factor in r which doesn't happen in sass, right? So how you change like you could change let's say you had character numbers number started characters You could change them to numeric using as numeric. Well, this is called as factor So we create these new set of variables called q one underscore f for factor Which is the factor version of these variables. So that's what we're going to do here And now when we see Survey two after I ran this here's the factor version Of all of these it looks the same but when you do the class like class survey to q one underscore f for factor See, it's a factor class and you're probably like, okay Monica they look the same Well, the problem is they behave differently. Okay Factors are going to behave like ordinal variables or like nominal variables They're going to behave like categorical variables. So one means a category, right? So now the next step step five. I create this Vector called factor levels. Now. These are going to be What ends up in the legend, okay So this is so if you had said neutral instead of neither green or disagree then this is where you would do that So I run that and it's just a vector that says these levels Okay, and I have to put them in order of one two three four five. That's how it knows, okay Now I'm going to shove the factor levels onto each of my factor variables see the underscore f Um, it's the same factor levels. I'm just shoving them on to I always say shoving it because it uses this arrow That's probably very rude, right? So now actually I want to show you survey two because it looks a little different now Remember how a minute ago all of these like these look like numbers and these also look like numbers Well, now they don't because we basically if you remember sass's Formats like you couldn't attach formats to levels of categorical variables This is kind of the same thing is you can attach These factor levels To factor very so it's like the analogous thing. All right. Let's go All right, so we've made it through now. We're on step six And in step six what we want to do is remove our fake data We had our fake data was there to hold the placers to make sure That when we did like q4 factor levels It wasn't missing one of these because if like everybody said strongly agrees someone agree and either agree to disagree And we're missing this it would error out at the step So we prevented that with our collage, but now we have to get rid of the fake data so we're going to create survey three by Keeping survey two only the study IDs that are less than this nine nine, right? So we'll do that and I realize it's a collage, but here we go So now we know you can see the at number of rows remember in our actual survey data set we had 20 247. All right, so now The only problem is let let me use a call names here. So call names survey three So we have a lot of columns. We're not going to plot like we're only going to plot these factor columns So the next step is to create survey four where we just keep the columns Now now notice here. We don't want study ID for the plot. We don't want we don't want any of these We want this one and these and see this one and this eight This is column eight and column nine and this is column seven. So this is me saying Brackets survey three brackets. I want all the rows. That's the comma like everything You know, that's why I didn't put anything before I want all the rows and just column seven through 11 Now hard coding columns you want in your code is risky, but we're just making a plot data set So we can get away with it, right? So we'll do that and then Now when I do, um, let me see if this works say see I use the up arrow. It worked again. Now. Let me see, uh, this is curvy Yeah, so now survey four only has These um these column names in it. All right So those columns but that these column names are not what we We want the actual statement To come out on the plot So this is where I was using real survey data. Just a few questioner statements from real survey data This is where I just replaced these var headings with just nonsense statements And so I I made these nonsense statements and called the var headings And now I attached them see this names. I made them the column headings I replaced q1 or to square f with I want to live in a world with unicorns. I think that's how it ended up Let's look at it here Yeah, so now I replaced them with the and I know that sounds weird, right? Like in sass you could never make a Column heading that's a sentence You know, but welcome to r right you get to break some sass rules Makes it feel good. Finally we get to the plot so Remember when you're using r you have to if you're using a package you got to install the package, right? So I already installed the package like earth. That's what we're using And so we call it up with the library now how How to you can see here I'm running the like earth command on this plot data set that I worked so hard to make and that Creates this object p Then I use this like or bar plot command with all of these settings in it To generate This object a and then I plot a so this is super complicated I encourage you to read the blog post to figure all this out so you can see these are just um options i'm setting really like I think if I run p here Okay, so what if you p is just the um proportions that come out And then if I run this like or bar plot on p these are all these settings. I generate a And then when we plot a This is what we get All right And if you go you can get all of this code and an even an explanation of all of those options at that um Uh at that blog post What we just talked about was like analyzing This like earth plot like producing this like earth plot from survey data basically survey data, right? So if you're In my audience right now, you've probably done surveys and probably you probably do research And you're probably used to like making a research protocol or making a plan and gathering data And then trying to you do a plot like I just showed you with it But nowadays researchers like us are expected to actually know more than that We're expected to like be able to analyze data from applications. In fact, somebody I was just talking to yesterday Talk to me about one of these data providers you can log in And you can analyze and counter data And you can analyze data from medical records and from labs at real hospitals real world data is what it's called But that person even though they're really intelligent was very confused like they didn't know Well, they're like monica. I don't know what to connect. I don't know what data sets to connect I don't know what makes sense. Uh, should I look at inpatient? Should I look at outpatient? What makes sense with my research question? And I was like, yeah What if you're expected to analyze data from an application? It's really not that straightforward So because of that I came up with this workshop called application basics Um, the big picture is our theme this month So this is an online workshop with the learning objective to understand data sets from applications well enough to analyze them and produce results And if you come to the workshop, you're going to learn about computer applications Like how these applications are designed Like the teams that design them and how the data are stored in the applications You're going to learn the terminology around application development So you can start using it to communicate And with this knowledge you can break through communication barriers to get the answers you need to complete your analysis And be seen as an expert So here are some details about the workshop Again, it's called application basics the big picture And it's saturday and sunday March 23rd and 24th and each session starts at noon eastern time And lasts about three hours and it's an interactive zoom online workshop And a normal price for a workshop like that is about $250 to $750 For workshop when you have these two-day interactive workshops where you can network with data scientists But lucky you because you attended my likert scale workshop Today my likert plot workshop your cost is free. I have not found another workshop like this That has that delivers this information to a research audience And I have gotten a lot of very positive feedback from the participants So I really would hope that you would sign up for our workshop and again follow our company page And make sure you stay up to date about our Events because I'm going to show you how to do those other three plots You saw and especially if you're into psychometric analysis like making psychometric instruments or analyzing data from them You know, I'm going to have something on factor analysis. So you're going to want to know about that Next time you do a survey you want to use definitely use the likert plot. It's really great for um It's it's really great for um interpretation So what will happen is especially like Let's say you've got a product And you've got a statements about this product and they're all positive like the product was fun to use The product was easy to use the product was intuitive to use the product was um made me feel comfortable The product made me feel confident You can throw them all in one likert plot and it'll sort it out for you Like if the product did not make them feel comfortable, that's going to be on the bottom, right? It really just helps you with the first pass of trying to sort out the likert data Like I remember I used to get likert data and I'm like, well, what do I do? Do I just make the percent that agree And what you're never supposed to do is make a mean out of it. All right senile That's the wrong thing to do because then you're not handling it as an ordinal variable You're handling it as a continuous variable. It's uh, what is it a novices common mistake? But then you're like make you know, I'm shaming these novices for making means out of ordinal variables But that's what we do, right? Like if you go on um yelp or you know uber or whatever you see Oh 4.5. I have a good driver. You know, you're not supposed to make a mean, right? And so If I'm yelling at everybody, you're not supposed to make a mean out of it. Then what are you supposed to do? So, uh, the this is my answer. You're supposed to do the likert plot Well, thank you everybody for showing up today I really appreciate it when you come to my live streams because I don't like to talk to nobody and I just love seeing everybody's faces here on zoom And I hope you have a wonderful Tuesday in a very good week Thank you for watching this video, which is part of the public health to data science rebrand program If you are interested in joining the program, please sign up for a 30 minute zoom interview using the link in the description