 In this video, I want to talk to you about the creation of scatter plots using plotly inside of R So I'm gonna use our studio for this You can see that I've already opened a new Markdown file and to save some time so you don't have to go through the pain of watching me type I've already typed in everything so we can just run we can see the YAML up the top here. The title is going to be scatter plots using plotly for R Created by me and it's going to be exported as an HTML document in the end Our first code chunk here, which is just the standard setup except for a few things I've already saved this file to a folder on my computer and I'm just going to set the working directory to the get working directory Yes, it's gonna find out what directory this folder is in and it's gonna set that as the working directory We're gonna import two libraries plotly and deeplier So we can just run that code cell and that should execute no problem then I'm also going to introduce a bit of CSS a cascading style sheet So I want my headers h1 h2 and h3 to be colored in a certain way So I'm gonna say style that's those open and close HTML Containers their style and close style and in the opening one I'm going to say type equals text forward slash CSS cascading style sheets and all I'm going to do is just use this hexadecimal values To just refer to the color of each of the headings. That's all we're going to do and I also Want a nice little logo my research groups logo to appear So it's going to start with a bang there or exclamation mark and open and close square brackets because I don't want any Text there. I just want the image And that is the logo image there and it lives this png file is in the same directory Folder as this our markdown file And that's why I could say get the working directory of this file and then set the working directory So I don't have to refer to the address on my computer's hard drive Where this file is located I can just use it as such to hashtags That means we're going to have a heading to and I'm going to call it introduction Just the introduction you can read up about that heading to is going to be creating Simulated data So what we're going to do here is to set the seed 1 2 3 so that all these random numbers You do random numbers that are going to be generated will be the same every time you run them And I'm going to create a couple of computer variables. I've got age here WCC so for instance for white cell count CRP C reactive protein SPP Static black pressure and group so my age is going to be from a random uniform distribution I want 200 minimum age 15 maximum age 85 For white cell count. I actually want to round this to a single digits I'm using round and then the second arguments digit the first argument is what I want I want 200 values from a normal distribution with a mean of 15 and a standard deviation of 4 We're going to make the CRP run off of the white cell count So there's a bit of a correlation there. So I'm just saying take white cell count and add To that remember there'll be 200 values in that white cell count variable, and I'm going to add 200 values to that It's from uniform minute minus 2 max 10 and I divide that by 10 So I could have just put here 0.2 and 1 that will be exactly the same So I'm adding these that there's some correlation SPP as to surround random uniform no digits between 70 and 180 And then I'm just going to sample with replacement from these two String vectors here or the string vector here with the two elements treatment group and control group I want 200 of those replacements to and then I'm going to create a data frame with these columns age What's our count CRP SPP and group and they're going to hold the values that are inside of these computer variables Let's run that and look at the first six rows Very beautiful there age should have rounded that shouldn't I white cell count CRP SPP and group So let's create a scatter plot in the introduction there What I was going to say is a scatter plot takes an x-axis and y-axis Those are two variables and for each individual each row It'll take two values create a little pair and we plot that on an x y-axis as simple as that So I'm going to create my first plot. I'm going to call it P1 You can call it what you want within the limits of what you can name computer variables plot underscore Ly That's from plotly. I'm going to use the format of passing a data frame to plotly So I'm going to say data equals DF and then I've got to use these little tildes To refer to each of the columns some x is going to be the white cell count The Y is going to be CRP the type I want to scatter the mode I want is markers So with a scatter plot you can actually do lines as well So it can go from from point to point just the lines You can actually leave the these the markers away and just have lines so mode you can say markers markers plus Lines you could write only lines, but yeah, we just want markers. We just want the dots and Something specific about the marker. I want to pass it a list But only one argument in the list and that size being 12 So it's just a fair size for these dots in my scatter plot to be and then we can to pipe this So percent greater than percent because I want to add a layout my layout is going to have three arguments title axis x-axis and y-axis x-axis is going to be Correlation between white cell count CRP and on my x-axis. I want to pass more than one argument So I'm going to hide that inside of a list so the title equals white cell count in zero line equals false That is the actual black line that is drawn on the x-axis and y-axis. I just want them not to be shown So then just just going to call P1. Let's have a look at that And there we go. I've created the C reactive protein to be correlated to white cell count We did that we created 200 more values up here and We just add them edit them element wise so that there's this bit of correlation and look at that I have a lovely plot as per usual. I can zoom in pan around go home I can save it as a PNG right here I can open it up and plotly when I hover over each of these this one for instance would have a 20.8 white cell count and a CRP of 2.8 when I hover over these I can actually see So that is very good. Now. We do have a group Variable so we might as well group by that so let's have a look at that everything is the same I'm going to call a P2 so that I have different plots saved Separately the data still the data frame still white cell count and CRP But now I want to color by group that is nothing to do with a physical color on the screen It just says take whatever you find inside the group see what the sample The sample is all about the sample space So you the unique values and we know if we go back up we said treatment group and we said control groups It's going to find two and it's going to color them differently I'm actually going to specify the colors and I'm passing it this string vector So I'm going to say deep sky blue, which is one of the recognized colors and orange The type is against scatter the mode is markers and the marker has this list size of 12 the layout after the Creating this pipeline is exactly the same and let's have a look at What are did for us here with plotly a beautiful graph with a deep sky blue and orange And now we can see control group one and control group two and as per usual I can just click on one of them to take them away click on the other one now They both away. I'll only want to look at the treatment group Click that away. I only want to look at the control group I want to look at both of them. Is that the most fantastic thing you've ever seen So let's add a third variable as a color scale So we know that we had we said color equals group here Plotly and are clever enough to understand that that's a categorical variable, but now we're passing it a numerical variable age So it knows to create a color scale. So that's the only difference we've introduced there And let's have a look at this a Third variable now the dots are colored by the age the younger patients being dark blue and the older They get the lighter colored gets This might be very very useful We can even bring in a fourth one or just stick with three and not use this color But just use the size of the actual dots. So now we actually have four variables That we're going to add now is that useful should you really do it? Is it easy just to look at it and figure out what's going on probably not this is probably overdoing it But I want to show you the power of using plotly inside of our So everything exactly the same p4 this time data is data frame X is white cell count Y is CRP the colors age Now the size of the dots I'm going to play around with so I'm going to take the SPP This is Ptolly blood pressure. I'm going to divide each element by 10 and I'm going to round with the zero digits So you've got to know that you know round about 10. This is 12 So you have to know in your mind what it's going to look like So this is round about 10 as some of this is Ptolly blood pressures are quite high to the 180 So you don't want those huge dots and that's why I divided by 10 It sort of makes sense in the scale of Tang being a fair size between 10 and 20 really It's what you want for the size of the markers and I'm going to round that off to have divided by 10 I just want to round to zero digits Everything else being exactly the same. Let's have a look And there you go a beautiful plot So now we have four variables in this one plot for every patient so hover over there I see this patient at a white cell count of 23.8 and The CRP of 3.2 I can sort of see on the color scale how old they were and Then by the size of this dot I kind of know the blood pressure was on the highest side because it's one of the larger dots So quite a bit of information hidden there and you can well imagine that if you have a real data set not Simulated as here. There might be a lot of information in there You might learn something from this plot, but it's probably overdoing it a little bit There you go scatter plots using plotly for