 So, again, all these slides are still covered under Creative Commons, so as long as attribution is given, you can share them, remix them, whatever you like, and we're going to jump into our second script. And the second script here is what you actually did on your break. So by the end of this lecture, you'll be able to analyze and graph categorical data in R, create graphics made of multiple plots, and then independently start right and run an analysis in R without the script being made for you beforehand at all. All right, so I'm going to go ahead and clear our yeses. It looks like we're ready to get started here. So you all have your... Oh, no, this is just a review of day one. And now we just did a little assignment. And so you should have created a graph, something like this. So here you would have set your working directory. You would have read in your data, OK? You inspect your data. So you look ahead, the structure of the data. You would have created a factor. So I had a lot of questions about this come up on the Slack. So you could have just left the levels labeled as zero and one if you wanted to. So you don't actually have to add labels here. And these would have showed up a zero and one. And that's totally fine. It's OK to have numbers as your labels for your factors. But oftentimes when you have people reading a graph, you want them to know what the zeros and ones represent, for example. But it's totally fine to not actually add labels on there, OK? Next, you could do a box plot if you're using base R. You could do a box plot that doesn't look like this. It looks like a base R one, similar to what we did before. A lot of people had issues with the legend. So if you use the coordinates from the previous example, then it's far outside of the actual coordinates. So as you can see, these biomarker values take on values around like between zero and one. So if you set it at like 93, for example, as your y-axis value, it's far outside of this graph. So you want to make sure that your legend table is somewhere in here. Your coordinates are somewhere in the range of your actual graph that you're showing. So that's likely the issue coming up there. So here we're creating a legend. I just added these colors because, you know, they could be anything you'd like. And then down here, if you wanted to do a Gigi plot graph, again, create a box plot with Gigi plot similar to what we did before. Y is the biomarker. X is the responder factor that was created up here. And this, it actually is important that this is the factor variable. Whereas here, it didn't actually have to be the factor variable. All right. So and the legend would come up automatically because we did this scale filled manual here. All right. So I hope you all found that OK. If you didn't in the next break, we can discuss it. But now we're going to move on and work on our next script. All right. And so this one is going to be totally from scratch. So you're going to open up in our here. I'm going to have I'm going to show it here. You're just going to open up a new R script. So here I'll show that again. Plus sign R script. Open up a new script. And you're going to just get started from a fresh, clean script here. OK. And you're going to read in example data to you're going to start your script by setting your working directory. So you want to tell it where that data is. You'll need to go and download that data from the website. So here it's pinned. The website is actually pinned on the slap. But here is the data. So example data to all right. It's a TXT file now. So not a CSV file anymore. OK. Read in your data using read table instead of read CSV. Everything else will be the same, but it's a new function. And you're going to create three factors now. OK. So now we have more characters or more different groups. So you're going to have a factor for your sex variable, a factor for your site variable and a factor for your treatment variable. So these are three new variables you're creating in that data frame. And once you've done that, go ahead and click yes. So this now you're really going to be set out on your own. And feel free to post questions, snippets of code, whatever screenshots in the Slack, and we can all work together to solve any issues you're coming across. OK. All right. We've got to know. Feel free to just cut a little snippet of what errors you're coming across, if anything. All right. Thanks, Gabby. So we've got a couple of yeses coming along. Some really nice sharing from Dahlia. I'm sure others are experiencing confusion about those parentheses. You only want one pair of parentheses around your whole command here or around your whole containing your function here. All right. Ren also needs a breakout round. OK. No problem. Thanks, Greg. Ren, I might have chosen the wrong wrong user for you. I can't find you in the list. But if that was the right one, feel free to join the room and I'll see you in a moment. Christina, is your DF3 in your environment? Like I read it in. I mean, yeah. And so you see DF3 there. Like it's yeah, like I have it open on like the top, like I opened it up to click on it and look at it. OK, so that's there. How about treated? Is that a column in it? Is that spelled right? And the capitals are all there? Yes, but I didn't know if it was different because it's a table now. Like it doesn't really have the header part. Or maybe I just didn't. You have to still tell that for read table then, right? I probably just missed adding that part. Yes. If you said header equals true. I didn't add that for that one. I thought I didn't know if that I guess I didn't realize for the read table. OK. Yeah. No, that's fine. Absolutely. This is a trial and error. So maybe it doesn't see that. Yeah, header. Yeah. Thank you. No worries. Please do not ask for a breakout room directly. Please at least describe your problem or right. And Gabrielle, can you? I don't know. Is she here or is she? No, she's not here. OK. Yeah. So everyone's in breakout rooms. Diego, in the meantime, yeah, do you want to just take a little screenshot? It's really helpful for us all to see any errors that you're coming across. Yeah, I can do that. Awesome. Perfect. So the levels and they need to be in quotes. So the M and the F there. And the reason it's giving you errors is, yeah, it just needs quotes around those, Carmen. Yeah, exactly. So when it's a number, R is fine with no quotes. But when it's anything that's not a number, so any character strings of any kind, they all have to have quotes around them. Unless they're a variable. So R is looking for that M to be a variable. And then F, you notice it's orange there. It's actually saying that's false. So it's like, oh, false and the M variable is what it's think interpreting. Yep. So Diego, you're getting the same issue. You want to put your M in quotes and your F in quotes. Otherwise, this should run just fine. Yep. And then Reza, you're just missing a closed parentheses there. Let's see. Do you think you could take a wider screenshot? I want to see just if it's a plus sign or what it's doing there. Because it looks like it might be an issue from above. I'm not sure if I've got everything yet. This is looking awesome. Levels. Yep. Yep. Good. Great. Lambert, that's right. You can go ahead and click yes. You're good to go. OK. Reza, the problem is when you say levels, you need a little C in front of the parentheses there. So it says levels equals and then just parentheses 0, 1. You want to concatenate them. So it's a vector that needs to be there. That's definitely not something that's immediately obvious. So no worries. But yeah, you'll just want to see in front of that. Can you also explain the first one? Because I saw somebody wrote M and F. So it should be M and F or 0, 1 is fine here. Aha. So if we look at the data itself, because sex, it's coming in as M's and F's, you'll want it to be M and F instead of 0, 1. So in levels, you want a quote M and a quote F. So basically, if I understand it correctly, for example, if not, like one corrector, say, it's like four levels, say we say, based on like BMI, obese, non-obese, something like that, so four types or four group. So in that case, concatenate, bracket, close bracket, bracket start with like obese, non-obese, normal. So I have to spell out all those. Yeah, yeah. If you want to label them that, but maybe in your data frame, you have it labeled as like O and N, something like that. Then you would write those in the levels argument. You would write those out in quotes. But if they're not in one corrector, they're in like multiple characters instead of O, M, like not one letter, but whole word. Yep, then you have to write it all. Yeah. OK, so the concatenate, but it's this one. In this case, it's not under the quotation mark, right? So in this case, so for male and female, for that one, you want it to be levels and then in quotes M and F in quotes. So similar to what Dahlia has here, she's pretty close. Yeah, and so Dahlia, sorry, just to jump to yours, you just need a DF dollar sign, sex, in that first argument there. But Lambert, you can, or sorry, Reza, you can see in Dahlia's she has levels equals and then concatenate M and F. Yeah, yeah. OK, thanks. Thank you, yeah. Thank you guys for sharing these errors. This is really helpful. All right, and don't forget to click Yes once you've created these three factors. So you just have your data read in and then you created the three factors. Dahlia, so for yours, it's the first argument here. So I'll just show you actually on my screen here. So here where it says DFSex, you actually want it to be DFSignSex. So you have DFSignSex. Here, though, the problem is it will have now overwritten it and it's going to be missing values. So it could be that you've now lost it. You may need to read your data in. So you just go back up to the top, read in your data and then run this line but have $SignSex and maybe change this to sex factor or something. So you don't, if you override it, you're not overwriting and make a mistake, you're not overwriting missing values, for example. As is, yeah, thanks for asking around. So as is, it says when something is a text string, a character string in the data, so in the actual text file that you read in, it's not by default creating a factor, it just keeps them as strings. So R doesn't assume that it's a grouping variable. So when you say as is equals true in this, it's going to say, okay, this sex is just going to be kept as a character string. It's just going to be quote M and quote F. If we didn't have that set to that, then it would say this is a grouping variable and the first group I see is M, so that's a baseline and the second one is F, so that's the alternative. So this is an example where not reading in the data frame as is, so you could say as is equals false, it might actually automatically give you the factor you want for sex. So it could work either way, yeah. Awesome, we're getting it, sorry about that, there we go. And don't forget to click yes once you guys have it and go ahead and experiment. If you want to start plotting things, you know, exploring the data, check out structure, check out the ranges. Yep, so Reza, just make sure you put the M and the F in quotes. Wonderful, almost there guys. And as you can see, many of you are having the exact same issues, so sharing your errors is very helpful for everyone. Let the last few people get theirs together and don't forget to click yes once you've got it. We'll move on once it's there. Can you stay there? I'll give you two more minutes and then we'll get it all in the solution. Nagla, so the levels, they zero and one because they're numeric, they don't have to be in quotes. I know, it's kind of annoying. So the numbers don't have to be in quotes, that can be just zero, one, but the character strings do have to be in quotes like M and F. So here it's levels and then quote zero one, just zero and one, no quotes. Great question. So a couple of things. So you could say unique, let's say DF sex, this will give all the unique values. If you wanna see just what are all the possible values that column could take, if it is a factor already and you wanna look at the levels, I believe you can use levels and then DF and then it say sex factor, then it will show the levels. Yeah, it's say it's not this row, it's not factory yet. So for example, the site, when I was doing it, I was curious because our table is small. So in that case, it was easy to see like one and two like scrolling, you know, like opening the table, different, but when I'm dealing with a big table, so unique is the one way to do it, like to check how many levels are there, right? Like in the column. Yeah, let's see. So Lauren, just you were typing something in, but you're still on your PDF, you're still showing, right? Or were you wanting to show what you were typing in? No. Okay, okay. It's just in, yeah. Okay, okay, you were not showing, so that's good. Good. As intended. Yes. Yes, yeah, so Diego, you could do either, you could add new columns like you show here, or you can overwrite columns like some others have been doing. In general, I believe adding a column is better because then you can verify that you've done it correctly. Whereas if you override it, you lose the original data and so you won't know if you've actually recoded the factor correctly necessarily. It's easy to make errors that way. So exactly right. You have now multiple columns added on to that original data frame and those columns are all factors. So that's great. All right, everyone. So if there's anyone who has some additional questions on this front, oh, let's see here, Nagla. Let's go through the solution and see if that actually clarifies some things here. All right, so as we saw, there's actually multiple ways to do this. Multiple right answers. So this is just one right answer. So here, of course, I'm setting my working directory. I'm reading in my data, but here I use read table. The reason is these are tabs separated values now. See, so they're separated by just white space here. And so now it's a table. R is going to read it in and it's gonna say, great. I'm not looking for commas anymore because those are comma separated values here. I'm looking for white space. And when there's a white space, it's treated like a new column. So here in the second row, you can see, okay, this is the ID and now there's some white space. And now the next column is age. So it's gonna treat this as age, okay? And then more white space. Okay, now I have the sex value, all right? So this is how it's reading it in. You can investigate it using lots of these different functions. We can see the names, the head, dim structure of the data to see kind of what it is looking like. Though a lot of people brought up that there's other things maybe you'd like to know, like what are the unique values in the column? So you could do that. I'm just gonna actually now show my data frame. So first I want this to be changed to learn Erdman. I'm just gonna hope that I have it. Ups, I might need it in the, there we go, okay? And then I can see these. Now when we're creating these factors, okay, that was a really good question I believe Reza asked where maybe you just don't know like all the values these can take on. So what you can do first is, so here I said structure and I saw these and I'm like, great, okay, like I see these ones and I know that I want sex, site and treated to all become factors. What I can do is I can say, you know, sex I'm pretty confident about like MF, good, this one is fine, that factor is fine. But what about site factor? There could be 25 sites, you know, there could be any number. So you can use something called unique, oops, excuse me, unique. And then I saw DF2 here and then I say site. And then if I run that, then I know there's only two values that can take. So then when I'm writing out site factor, my levels would be specified one, two. And then I say, you know, in this case, I want site one to be my baseline. So I'm just gonna say site one, site two and this is ordered one, two. So they're corresponding there. Okay, similarly, a treatment factor, I mean, you could have levels of treatment, different types of treatment. So here again, if we want to look at what are the unique values, we could say DF2 treatment factor, treat, treated, no, that was right. So treated and we see, okay, it's ones and zeros. And actually I want zero to be my baseline, untreated versus treated. So here I say it's control versus treated and zero is control and one is treated. You could have named them anything you wanted. That's totally fine. And then here, what ends up happening is, as I believe Diego pointed out, I'm creating three new columns. So now if I look in my environment pane, so I'm gonna open it up here doing that DF2. I could look here if I wanted and I see they're added on here. I could have also looked down here and said structure DF2 to compare to my original structure up here and I see they're added on. I see my levels and I see which levels there are. And then I see the encodings here. Note that these are numeric here. So these are the which factor it is by order. So the first one, if it says one, it's the baseline. So it's one. If it's two, it's the second one. If there were a three in here, it would be the third factor. And it's just following the order of the levels that you set out here, okay? All right, so I'm just gonna go back here, create factors. How can you count the unique values? Great question. So when you run unique, this is a vector that you get out. So you remember the length? You can actually use length, length around that vector that is output by unique. So then control enter, it's a length of two. That's how many unique values there are. Okay, all right. And I just wanna zoom in on rounds error here. Okay, change the, because you have to use ggplots, it's a new geom. So ylab becomes its own function there, ran. All right. Okay, so going back here, let's talk about some quick stats. You're doing lots of stats all the time. If theoretically you wanna mean, you wanna know the range of values, do you like to summarize them? These are some nice and useful ones, but again, if you want to figure out like, let's say you want the variance. So, oh, there's no documentation. So there's no function called variance, but I could try double question mark and then it will search for things that are not exact matches. Well, this is terrible. Let's see. Matrix stats, weighted there. That's not really what I want. Variance covariance matrix. Darn guys, I think I just chose a bad one. So here stats var, this is actually gonna give you the variance, var. Okay, so then we could say var. Let's look at the variance of our DF2 marker one. All right, so you can use these quick stats or these functions to find stats for your values. So here is the mean, it's the arithmetic mean. So you could take the mean of the age, the mean of marker one, standard deviation, right? Take the standard deviation of either range. This will give you the minimum and maximum value. So we talked before like during the break and rate after about how sometimes when you were putting the legend somewhere in your plot, you were not finding the legend, you're not seeing it. And sometimes you wanna know what are the range of values that a certain variable can take so that you can know what are some values that I can use to actually see the legend in my plot. Range is going to output the minimum and the maximum value. So here, if we did range marker two, ah, actually sorry, first I wanna show you something. If I wanna use or revise this, what has just been written, I can access anything that has been output or run on my console here by just using my arrow keys. So my cursor is here and I'm just arrowing up. So that is something I ran just before. This is something I ran just now and then I arrow. So I'm just arrowing down, arrow up. I get the previous before that. So you notice this is what was run before. Now I'll do it again. Now it's that, do it again. And so just going back through my history. So if there's something that you made an error on and you'd just like to run it again, you can also just go down to your console and push arrow up and then you can just have that line ready for you to revise. And so what I wanna revise here is I wanna do range instead of variance. So range and I get the minimum value and the maximum value for that marker. So that's range. And now aggregate is giving you that statistic, but it's going to split your groups for you. So here you see aggregate is the function. I would like to compute my statistic on age. So I want the means of ages. So I want age to be computer on here. I want age split by site. So I want for each site, I want to get the mean age. So the last argument is the function. So if we were to run that aggregate, DF2 age list. So the second argument has to be a list even if you're only putting in one factor. But I'll use two factors next and we can see what that'll do. So DF2 site and let's do mean. And you just want the name of the function. Don't have any parentheses like it made me do automatically, okay? So group one is this first factor split. So site and then it splits it. But wouldn't it be easier if I could actually see which site it is? So there we go. Now we have the name. So it updates it, right? So instead of one, two, it's site one, site two. But let's say I actually, I want it split between site and sex. So the way you would do that is in the list that's the second part here, DF2 and then sex. And actually I want it to be sex factor. M and F is not enough for me. I want it to be spelled out. So here we go. And it's going to give me site one male, site two male, site one female, site two female. Let's say you want to add confidence intervals. So you want to compute the standard deviation and then use that to compute confidence intervals on these. Instead of mean, now you use SD. And so then again, now instead of the means for each of these groups, you're getting the standard deviations for each of them. Okay. I'm curious myself. So I'm going to do range. Let's see. So now range outputs two variables, right? It outputs the minimum and the max. So now for each site and sex, I get the minimum and the maximum age value. And so it's just an easy way to get like a full kind of descriptive stats split by whatever factors you'd like it split by. But notice it didn't have to be splitting factors. We can do it with these not encoded as factors. So sex, for example, is not encoded as a factor and it will still work. So it's just going to be splitting on unique values, not necessarily factors. All right. If you get in missing values. So if you have a missing value output by mean SD range, then that means there is a missing value in your data and you'll want to do NaOmit before you compute the mean in general. So what NaOmit does is it gets rid of the missing value and then it says now compute the mean. All right. So I, we'll just show that quickly. The miss, I'm going to make a vector that's one, seven, two, Na993. Okay. All right, Lambert, we're going to get back to you. Like we'll, you'll find your way back. Trust me. So there's a vector with missing values. Let's say I wanted to take the mean of this vector. Okay. So I just do mean V miss and I get a missing value because there is missing values in that V miss vector. So what I would do instead is mean NaOmit V miss and now it will compute because NaOmit V miss, it takes these values. So see, here's our vector with all those values, but the missing value is gone. And so what this attribute says is the fourth value is missing. So it's saying this fourth value has been removed. So it tells me what values have been removed at what location. And it returns my vector without missing values. So when I compute the mean, it's computing it on a vector with no missing values. Okay. Okay. We're all, we're going to come back to these. Okay. I know that's this aggregate. Took me a while to get my head around. So if you don't get that one right now, don't worry about it. This can be just kept as your reference. Okay. Now the table function actually just tells you unique values, but it counts them as well. So remember when I was showing you here, you could have unique values for treated. So treated has ones and zeros, but let's say now we've created our factors. Let's say I want to know how many are in each of those groups. So I can use table DF2 treatment factor and I have 50 controls and 50 treated. Okay. So table is not just giving me the unique values because if I just did unique, unique DF2 treatment, oops, treatment factor, it gives me treated and control, but table counts how many treated and control. Okay. So this table could have been used up here when you were creating your factor. If you wanted to know the unique values, up here you could have said table DF2 treated. There are 50 zeros and 51s. Makes sense. It should be the exact same number as the control and treated. All right. Now the cool thing is, and this is excellent for checking that you've done your actual factor in coding correctly, you do table DF2 sex, DF2 site for example, or better yet, let's do table treatment factor, DF2 treated. So it's a cross tab table. And now we're counting how many controls are zeros, 50 of them. All of the controls are zeros and how many ones are called treated and that's also 50. And so none of the very values that you have encoded as control are ones. This is exactly right. Because what you did here was up here where you encoded your factor, you said if it's a zero, make it control. If it's a one, make it called treated. And this is confirming that that's been done. Okay, so it's a cross tab of your, okay. So, whoops, all right. Reza has a raised hand. I have a quick question. Yes, please. So when we are checking on the table, say we didn't do any NAO meet, right? When we change the factor. So say there was NNA in the site or treatment. So when we are converting, are there gonna be change into, like when you're converting to factor, are they gonna be changing into one of these or it will be NA? Very good question. Yeah, it just is kept as missing. That's a great question. So that's a nice thing. When R is computing means and standard deviations and things, it needs everything to not be missing because it's creating some summary statistic. But when you're doing a factor conversion, if there's a missing value, it just remains a missing value. So it will handle that internally. Very, very good. Very good question. All right. So here, tables similarly, let's say there's a missing value here. It also, it will just be a new category, essentially. Like tables are also fine with missing values. It just counts them as a new category here so that you know that there's a missing value there, but it won't be adding it to some group. Looks like... It may be that has your data, oh no, it has been read in. Hey, yeah, I'm not sure what's causing the error. You have a space after factor in one of them. That might be an issue instead of it having your parentheses follow. So see, there's a space for one of them. That should, oh, with one of them. Interesting. Okay, I fixed that, but that doesn't fix the one for a sex in sight. You may wanna try reading in your data again. It could be that sometimes it will have... I believe that you've just downloaded the wrong data source. So if you see at the top in line four, it says a climate data too, you want example to get to. So your code should actually be correct. It's just those columns don't exist in the climate data. Okay, thank you. Great, wonderful. Yeah, sometimes you've got it all right and it's just the wrong data. Good, thank you for asking. All right, so tables can also be embedded in bar plots. So here, I'm actually gonna run this sex site table here. So let's do it here, sex factor site. All right, so we have 25 males at site one and 25 males at site two and 25 females at site one and 25 females at site two. So great, even division. This is obviously they've recruited in this way. And now I can actually run up a bar plot around them. So here bar plot is just wrapped around table and that will produce a bar plot where I'm stacking the different groups here in my bar plot. So here is what should show up. If you run bar plot around your table, so you're just plotting the table essentially. So once you guys have this bar plot of your table here, go ahead and click yes. And if you had any errors along the way, please also just share them on the Slack. I think it will help us all. Lambert, are you a little less lost now or are you still in no man's land? Still lost, okay, great. Gabby, could you pull Lambert into a breakout room? Awesome, we've got a lot of yeses climbing up there. That's good, good to see. And feel free, now you have a bar plot. Bar plots use the same base R plotting options as the box plot and histogram we made before. So give it a try, see if you can change colors, change access labels, add a legend maybe, see what's up. Reza, when you encoded your site, did you have one two as the levels or did you have zero one? Because it could be that none of them were in site one if it's zero. So if you put in your factor levels as levels that don't actually exist in the data, it's just not gonna have anyone in that group. So you may need to recode your factor, your site factor. Nagua, I'm responding to you. Yeah, so Emma, check out the table and see what values you're getting for the table. It may be that your site, again, if it's zeros and ones or again, also sex, if it's not encoded correctly, then that could be a reason for this. Let's see. Oh, it looks fine, sex and site. Try, yeah, for the bar plot, try not doing sex factor, just sex and site for yours. It looks like the factors maybe weren't encoded correctly. Just rerun the plot, Ram. If you just rerun your plot, it will overwrite the legend. The legend will be gone, but as you add legends, as you see it, it just pasted on top. So to refresh it, you just need to plot the plot again. Awesome, guys. And I'm gonna look for a nice base R plot options. Cheat sheet. And I'm just adding a link to the base R plotting cheat sheet. I'll put it in the Slack, but I also put it in our Google doc base. So here it's in the Google doc here, which you can just see here, different points, access, lines, lab, main, sub, et cetera. But if you even just Google, oh, actually, yeah, base, there's many. Okay, so I just took this top one, but there's many. And this is what I mean. Google is your best friend when you're working with R. Okay, but I'll also put that link in the, all right, excellent. Okay, it looks like almost everybody's got it. Don't forget to click yes once you've got your plot up. So in this case, I know which colors pertain to which sex because they're in the same order as the colors that I would specify. So this is why factors are great because you know the order, order matters a lot. So if I have males first, female second, then the color, the first color is going to be with males. The second color is going to be with females. So really good question, Christina. I'm just gonna write order. The order matches the factor levels. But yeah, it's very confusing here, right? Because here, how would you know? And it's the same number too. So there is really no other way to know. It's much easier if you knew like, oh yeah, I have like five versus 50 in one group. All right, so tables are great too because you can use them to conduct chi-square analyses. So just like you can wrap bar plot around your table, you can wrap chi-square around your table. So here I have the same table, but I'm wrapping the chi-square test around it. Or I actually know I have a different table and I don't have treatment factor, but regardless, both are fine. If you wrap chi-square test around it, you can actually run a chi-square test and get a p-value out along with your chi-square statistic. Okay, so try this with the other two comparisons. Site and sex and treatment and site. And click yes once you've done two more chi-square tests, okay? Awesome, we got one. So just a little statistical kind of reminder, the chi-square test. We're testing for independence between two groups, two groupings in a table, a cross tab table. That's why we can do this chi-square test on the table. We're testing that the groupings are independent between them. If they're not, if you are seeing a non-independent effect between two groups kind of coming together in any way, then you'll have a statistical significance. But the chi-square test is actually computed on the table of your data itself and that's why this works actually. You don't have to run chi-square test this way, but it's a nice way to do it that's very clean for your understanding. So it's two categorical variables. And don't forget to push yes once you have it. Very nice, looks like everyone's getting it, great work. Okay, does this have to be done with the factors? Like it has to be with the categorical? That is a good question. I don't think so because it's a table. So like we saw before, a table is just going to divide up all of your unique values and count them. So you shouldn't need a factor variable for this. Yeah, good question. If it isn't a factor, you want to make sure it is an actual grouping variable. And if you put in a continuous value, it will just count unique values. So most oftentimes it'll just be one, one, one, one, one in each of those groups because you'll have lots of unique continuous values. So it is important that it's a category, but it doesn't matter if it's a factor. Nice work. I'm going to move along. I'm gonna go ahead and clear it. I think we're close to everyone being there and if not, you should be there very shortly. Now we just looked at bar plots. So back to bar plots. We can update this bland bar plot from earlier to be more descriptive using techniques from our box plot example, as I alluded to before. I also put a little base, our plotting cheat sheet linked in the Slack. But as I showed you, if you just Google that as well, there are other versions of those cheat sheets and some might be easier for you to read than others. So really choose the resource that works the best for you. But here, yeah, this is very not descriptive and especially if you show this to someone, I mean, even I who know the data I'm not totally sure what I'm looking at here. For example, is our males or females, the darker gray or lighter gray, for example. So we want to make these larger. We want to change the colors and add a legend. We want to move the bars to beside each other. Actually, we don't want them stacked like they are. And yeah, so we can have bar plot. We start with the table and then we put a comma after the table and start filling in more plotting options. So you can push tab after. I'm actually going to skip past this here. So what you can do is you can add beside equals true. So the reason I wanted you to press the tab there, you can see here, I start, I have my table. I have open parentheses here. This, I can tell this closed parenthesis goes here because it highlights it in R, C, that's highlighted. Now I go after my comma and I press tab. It's giving me all these options for my bar plot. Okay, so the amount of space I want my width of my bar plot to take up. The width, I can actually choose the width and heights. Legend text beside and this is the one we really want. So beside here, it says it's a logical value. If false, the columns of height are portrayed as stacked bars. This is the default. This is what we're actually seeing on the screen right now. If true, the columns are portrayed as juxtaposed bars. What that actually means is they're just beside each other like beside equals true. So that's what we want to do. So here we set beside equals true to have those bar plots next to each other. We want the colors for each factor, the first variable in the table above. So call dodger blue will be the first factor level and dark orchid will be the second factor level. The levels we set, this is the legend kind of recapitulating this, but we have set the levels and you can check in your own code. Maybe you haven't, in which case you want to swap this. The levels I set in my code are m is first and f is second. So I would match that exactly. If you're not sure about your levels, let's actually check levels, df2, sex factor. And it will tell you the levels that you've put them in, the order that you've put them in. And so with the labels on there. So if we're not sure, we can actually check it using the levels function on our factor variable here. Okay. So males first, females second here. So blue is for males, dark orchid is for females. We want our access labels to be one and a half times larger again, and we want our access limits to actually make it easier to fit a plot into. So like we saw before, it was kind of quite tight and we want to actually add some extra space here. So what YLIM zero to 40 is doing is actually making it, so the whole graph axis goes up to 40, even if the actual data doesn't fill it in that length. So here we're just making a taller, bigger graph to fit legends into. And we're also going to make our access text larger. So it's easier to see the numbers themselves. Down here, we're making the legend text larger. And we're also going to set our legend at the top of where we put the limit. So we're putting it over to the right and at the very top. Okay. So go ahead and give that a try. Obviously use whichever colors you like, play with where you put the legend. You can even play with these YLIMS, maybe these CX values, you want them much larger, even smaller, they can be less than one if you want to make really tiny texts. And go ahead and click yes, once you've gotten your graph, your revised graph up. Awesome. We've got two people already, very nice work. Another thing you can do is instead of writing out your levels here, your factor levels, you can actually use the levels function and have it put the output here. So instead of writing CMF, you could write levels, sex factor, this DF2 sex factor, so levels, DF2 sex factor, and it'll print that vector of the levels that you'll have automatically. So that's another option. Excellent. We've got a few people having a nice bar plot up. Don't forget to click yes once you've got it. You guys are going to be pros at plotting by the end of the day. Ah, it doesn't look right. Do you want to put it up for us to see? Yeah, great. Yeah, that looks right, Dalia. So what's happening there is the legend's a bit far over. Also, you've rewritten the legend a few times, so it keeps adding it again and again. Try moving the legend on the X-axis. So try to move it a little over. And then maybe you want to even make your Y-limb larger if you make your whole plot frame larger, it also might fit a bit better. But yeah, that's really what's happening. What's the difference between adding legend longer? That's a great question. Christina, let's try just adding legend. So a vector, so legend text, it wouldn't be equals true. It would be a vector of text used to construct the legend for the plot, or a logical indicating whether a legend should be included. Let's see. Yeah, it works fine. So you can also do that instead of typing your legend out here. The main difference is the text, the C-E-X here, you won't have that control and you won't have the control of where to put it. That's the only difference. But yeah, it should be fine to do. Okay, thanks. I saw it in your script actually, which is why I wondered. Yeah, let's try again here. Let's see what this will do. Yeah, it's pretty good. Okay, thanks. Yeah, of course. What's your asking? Yeah, one caveat without approach is that you can't select the exact position. And then if you wanted to use like different formatting between the legend and the bar plot, or like any other plot, that would be quite difficult to manage. In terms of like size and font and that kind of thing. Yeah. Yep, so an additional option there. And that's something too, that's really nice. If you just press tab again, like here with looking at bar plot and you go to the next option, you can just arrow down through. Dahlia, great question. So I'm gonna create the problem and then I'm gonna show you how to solve it. So here we've got legend and then I'm like, oh, that's not really, maybe I want this to be levels, F2. So then I do it again. And then let's say, oh, you know what? I actually want it to be over there. Nope, that's not the right place. So I'm gonna make another one here. Okay, that's fine. And so now I have a lot of legends. The way to get rid of it is to run your plot again and then run the legend again. And so you only have one there. All right guys, don't forget to click yes once you've got it. It looks like we're almost all there. Really excellent. Okay, all right. So I'm gonna go ahead and move on but also thank you everyone for plotting or not plotting, posting your errors, sharing them with everyone. It's really helpful for us all to work together on them and ask questions. Okay, so this is something like what you should be seeing. And from here, actually, sorry guys, we are gonna go on a short break just to keep to our normal break schedule. So we'll take about a 10 minute break. So we've done this a lot here. I'm just showing the output. If we were to run view, DF2 is what I'm calling this data as DF2. And we see that we have five markers, okay? So this is common, right? You'll have like lots of different gene expression values, maybe lots of different microRNA, whatever you have, right? So these are biomarkers of some kind. And you wanna see which, if any, are associated with the treatment. So if these biomarkers are changing maybe in response to a treatment, all right? So first, let's plot these markers against each other. So we wanna see if any of the markers are correlated. That's what I mean when I say plot them against each other. And here, what I'm showing is a pairs plot, okay? So you're gonna use the pairs function and you're gonna create a scatter plot matrix using this. And so essentially we wanna have a scatter plot of all against all. So all five markers against all five markers. Here, this is the argument to the pairs function. It's a matrix. So we wanna just take those five marker columns and subset them, okay? So remember, data frame, subsetting, we take rows before the comma and columns are after the comma. If we leave it blank before the comma, we want all the rows. So here we're selecting all the rows and for the columns, we're selecting any column, so any of the names that have the pattern marker in it, okay? So that's what this is giving you. So if you highlight and run grep pattern equals marker, x equals names, df2, to see what columns grep is indexing, you can find the indices that match the pattern, okay? And we wanna all rows, so we left it blank. We want the df2 columns that have marker in the name, okay? So I'm gonna show you this in the code and then I'm gonna go back to this. So here grep pattern equals marker, x equals names, df2. So I'm just highlighting this internal bit here. And what this is giving me is these columns. But if I go to df2, or it's here already in my view, these are the column numbers being selected. So one, two, three, four, five, six, six, seven, eight, nine, 10, these are the columns that I want. So you could also just say instead here, so here we're doing it using logic, but we could also have said, I just want columns, six, seven, eight, nine, 10. Oops, sorry, 10. And this will also give me those columns, all right? So df2, those are the only columns that are selected, all right? So that's what it's doing there. It's just finding the columns for you and then selecting them, all right? So once you've been able to do this, go ahead and click yes. You should have a plot that comes out for you, nice. If some of you work on the command line, grep you may recognize, it's a common regular expression function. There are other ways to actually do this beyond what I even just described, so feel free to explore those ways if you'd like as well, if you've been able to do this already. Also, pairs is another base R plot, so you can also experiment with changing things about that plot itself. It is a little bit different, you'll see, but you can certainly make changes on it. You could also look at the base R cheat sheet that I sent and check out if there's some changes there that you see that you'd like to make on this, because now you've got a scatter plot, so you can change things about the points or point sizes, colors, different things like that. All right, I'm gonna make sure everyone's got it, so I'm just gonna wait two more minutes. Nice, global regular expression print, awesome. Grep looks for a pattern, and here it's outputting the location of that pattern. So the pattern it's looking for is marker, and the vector it's looking for the pattern in is the names of your data frame. So it looks across the names, and if it finds that pattern, it's like, oh, it's at element six. Okay, keep that, it's at element seven, keep that, and so it's very, very useful for these things. Alrighty, guys, okay, so I will continue on here. I'm gonna go ahead and clear. A few people haven't gotten it yet, but I think it's okay, you'll get it on the flip side. This part isn't so essential, but I think this is a really nice plot to summarize your data, especially if maybe you'll find that markers are very correlated with each other. Here, we see that they actually seem quite uncorrelated, so that's good to know. We don't have essentially co-linearity. If you have different predictors for a linear regression, for example, this plot can show you if those different predictors are actually correlated with each other, and maybe they'll have that covariance being thrown out of the linear regression. That's good to know upstream, so these can be very useful plots, these pairs plots to do on your data. So now let's plot them against our treatment. Can I see the Greffin column sign number code, please? Yes, so I'm gonna do this. Quickly, and then Reza, I'm gonna put that code up for you, okay? So next, we're gonna plot each marker by the treatment factor, okay? So here, we're doing box plot. Here, we set our data, DF2 here, but first, actually, we wanna have multiple plots all in one plot, plot pane, I should say. So we set the plot window as one row of plots by five columns, and so this means we're gonna have five plots all next to each other in a row, all right? So that's done using par, mfrow equals concatenate, so it wants a vector here, and it wants it to be rows versus columns. So one row, five columns of plots. Oops, pardon, there we go. And then you make five box plots, and it's going to make five box plots all in a row here, all right? So once you've done that, go ahead and click yes, and then Reza, here you go. I'm just gonna put this code up here, right there. So you wanna keep track of your parentheses on this one, parentheses around names, parentheses around for the grep, and then your bracket, and then your parentheses for the pairs. Nice, we've already got one. It's pretty easy with a copy and paste. Tomorrow we'll explore how to actually do this a bit more elegantly and quickly with a loop, and so it should output five plots all in one pane. Excellent, it's like a lot of people are getting this one. I'm missing the treated label. Ah, yeah, because it's squished, so if it doesn't fit, it's gonna just throw out some of the labels there. One way to fix that is las equals two. It will rotate your labels so they're up and down on that. So here, I believe las equals two. Let me check this. I need to make my figure margins quite large. Yeah, so that las equals two, now control treated is going to be there, whereas here it just kicks out the other one. Let me just show the command again. Las equals two. Yeah, Alexander, great question. So we specify the data frame here, so data equals df2, that's why you can just use the column name here. But if you don't specify that, you don't have to specify data equals df2. You could also do df$marker tilde, df$treatmentfactor, and that would work just fine. Awesome, don't forget to click yes when you got it. Great, it looks like we're almost all there. Really excellent questions. Certainly things that everyone's encountering. Ah, so Nagla, that issue means that, see, I'm even getting it here. It's plot margins too large here. It's because this pain, this plotting pain is too small. So if you just get to this four square arrow there and move it to a larger size, it will resolve that issue for you. All right, everyone. Okay, I think we're okay to move on. I'm just gonna click over to the next screen here where we can see this one obviously had it blown up to a huge plotting pain. So it was fitting both controlled and treated, but we had a great question while everyone was working on why is it being kicked out there? It just just kicked out because of the size. So notice even as I made this window larger, now I see it on both sides. All right, so it just cuts out whatever won't fit. It's a kind of annoying behavior, frankly, but if you wanna rotate them, you can use the command, or not the command D argument, LAS equals two. So that's what I did on this first one and now I've got control and treated there, but you see it's cutting into that. I wonder if it's now done. Yeah, it's still doing it. So then we might wanna change how far this treatment factor goes up. Gorgeous. Wonderful. Angle as well. What? Can you put them on an angle as well or just kind of? You almost certainly can. I'm not 100% sure how you do that. So I really, I suggest you Google it up. Otherwise, I will Google it later and print the solution. Yeah. No, or as I can look it up, I was just curious. Awesome. Yeah, no, I don't know off the top of my head. Yeah, I just know the LAS equals two because I've had this issue so many times. Great. So okay, so that's how you will get multiple plots in one window using base R. They don't even have to be the same plot. So these are five different box plots that we've put, it could be that one is a box plot, one is a histogram, one is a silhouette plot, one is, I don't know about silhouette plot, one is a strip chart, et cetera. You can add different plots. They just are just being arranged on that window using that par MF rows equals concatenated one by five. So it's just gonna fill in those five spots with five different graphs or plots. All right, so you may only see control or neither label if your plot pane's too small and LAS two is gonna rotate those for you. So, but can we do this doing GG plot? And we can, we can make a much nicer, I would argue actually in this case, plot using GG plot, the difference is here, we have total flexibility for each of these plots. So in this specific case, I'm gonna show how we can make a GG plot like this. But I would say if you have different plots that you want in each of these individual pains, base R is probably going to work better for you. So let's say you wanted like a box plot here and a scatter plot here and a bar plot here, et cetera, then you probably wanna use R. And I'm just gonna illustrate that really quickly. So let's say here, you don't want just all box plots. You want box plot and then you want scatter plot and scatter plot is just plot. And I'll say marker one versus marker two. And then let's say I want my next one to be another box plot of marker two. And then I want the next one to be a histogram. All these spaces like the enters and everything, these are just for myself. And let's say I want another histogram, but I want this to be marker one and marker two. Okay, so now these five slots are all filled in. I'm gonna fill them in with new plots now. So I'll plot this one again. Now I'll plot this one. Marker one is not found, I see. Let's see if it'll do it this time. Nope, so this one X doesn't like the data command. So if that happens, oops, sorry guys. If that happens, just use the dollar sign. So there, now I have a scatter plot. Now I have another box plot and then this, marker one is not found. So again, if it does that, just go ahead and do a dollar sign commands. I would have assumed that would have worked. So now I have a histogram of that one there. And do you see how, oops, sorry. It just filling in these slots. It has five slots and you can put whatever graph you want in those, whatever base R graphs you want in those. Okay, and so this is where it gets really flexible and it's really nice to work with base R in this way. If you wanna be putting multiple plot paints. I'm just gonna do one more demo where I actually change this layout. And let's say I want one, two, three, four, like a square, two by two. I can make this now two by two, two rows, two columns. And then let's say I want it like just these four. So now I have four plot paints to fill in or four slots. So I run that part MF row here. So now it's going to give me this. So one, two, three, four that I'm filling in. Okay, now there's my second one going in. My third one going in and my fourth one going in. Again, these can be whatever plots you want. And you are just setting up. I want two rows, two columns of plots and they're gonna be filled in one, two, three, four. Right, like you read a book. Okay, so that's just just to show you the kind of flexibility of this kind of approach. But now going back to this one, because we're plotting the same kind of data, the same type of plot, and we're just splitting it across different markers, we can actually do this really elegantly in Gigi plot, okay? So let's do it. So first we need to reshape our data. So this is where it's really important to think about how our data is actually set up. Our data currently is set up in a wide format. So we have different scores of different variables placed in different columns. We've got marker one, marker two, marker three. What we want to transform it to is one index variable that says marker one, marker two, marker three and then another variable that's value. So this is just the marker's value, all right? So a long column. So this is if we had three markers, you can think of it extended to five, right? And we're essentially stacking these on top of each other. So we're gonna make a long column and it's just the values the marker takes on. And then another column that tells you, am I looking at marker one, two, three, four, five? Okay, so long format scores on different variables. They're all in a single column and we have some index or some reference here telling us what we're looking at in a separate column. So this is done using the melt function, okay? So this comes from the library reshape. We installed this package, I believe at the beginning or you would have even before we started. So you're gonna go ahead and library in reshape. You should see a check mark in that library tab. And then you're gonna create a new long data frame, okay? DF2 melt is what I'm calling it here. And you want your data argument here to be the wide data frame. So DF2 are ID variables. So these are the ones we wanna keep per person. These are the ID, the site factor, the treatment factor and age. So each of these, they're gonna be repeated actually for each one of those markers that are stacked on top of each other because for each individual, they keep the same site factor for all of them. They keep the same age and so forth, okay? Then the measure variable, this is the one that's gonna be made into one long column. So this marker one, marker two, marker three, marker four, marker five, all right? These are the columns that we're just gonna stack. All right? And the variable name we want for the column that's gonna be the reference column and it's just gonna have these variable names in it. We're gonna name that marker, all right? So that's what our reference is gonna be. So once you run this and you see in your environment that you have DF2 melt created, go ahead and put a yes, all right? Nice, I see some have it already, good work. If you use names, I'm just gonna cover this really quickly, names, DF2, it will actually print out these column names so you can copy and paste them too if it's easier and faster to not be making spelling errors. So that's just the names command that we covered earlier. Great job, Ren, I think that's right. So what's the? That's it, it's in your environment, you created it. Oh, I just created some, oh, okay. You just created it, yeah, I think you did it right, yeah, good job. My concept of what's the outcome should be, okay. Yeah, it's anti-climactic, a lot of work for just a little. All right, thanks. No worries, right? And go ahead and click yes once you've got it. So again, it should just show up in your environment, Payne, and that's it, it'll run ideally error free and that's it. So this is like a semi unpivot because you're not kind of doing it for everything but just specifically for marker. That's exactly right, yeah. So you want to select things that you're gonna pivot and then, or unpivot, so to speak, and then everything else you wanna keep in that same format, yeah. Dahlia, could you print the warnings? So see how it says use warnings, and then close. If you just run that exact command, it'll tell you the warnings because if there are warnings, it's actually okay, it'll be just fine. What you're showing looks like it may be just fine, yeah. And it looks like you also have, one thing is you missed marker one, so you just have like a fewer rows than I would expect, but otherwise, yeah, yeah. Okay, so those warnings, sometimes something catches in a graphic state. I'm just gonna show everyone this really quickly. Sometimes you get lots of warnings, I can't reproduce them so easily, but if you are getting warnings like Dahlia, or sorry, like Diego is showing here at this like, display, do, try, catch, et cetera, et cetera, that can be gotten rid of by simply running dev off. What that's gonna do though, it's gonna erase all of your plot history, it's just gonna restart your plotting graphics because it's like something happened at some point that it was like not happy with, and now it's just gonna keep throwing those warnings. Those warnings are fine, if you're fine with getting them too, it's okay, it's not hurting anything, but you can write d-e-v dot o-f-f, and then just enter and see it just erases it all. That will also deal with this do, try, catch warnings that you're seeing here. The invalid graphic state is the key, like is what tells me that that's the issue there. Aha, yeah, so Dahlia in what you're showing, so basically you don't wanna just run the parentheses, if you wanna see the warnings, you wanna write warnings, warnings. So exactly as it shows, use warnings, and then the close parentheses, and see it'll show these warnings. So I'm getting these do, try, catch, issue, graphic state, so I just dev off to do it. If it can't find function melt, that means you need to library reshape. And if it says there's no package called reshape, then you need to install packages reshape. That's right, yes, Nagla. You're just gonna get, you're just gonna have your melted data frame, which I'll go ahead and create here, melt, and so you have DF2 melt, it should have 500 observations because you will have stacked 500 length vectors on top of each other, but you're basically making cuts of your data frames, so they're just stacking 100 length data frames, 100 row data frames on top each other, and there's five of them because we have five markers. So we know it should be 500 observations, a six variables. Note that the variables, the number of variables went down because we made five of those variables into one, and then we added a variable as a reference. So we could go ahead and view that if you wanted to, and you can see you'll have a variable column, and then you'll have a value column here. Variable name marker. It didn't take my variable name. All right, very nice. All right, I'll go ahead. Here, I just wanna make sure I'm wonderful. Okay, so it looks like we're almost there. I'm going to keep going, and if you're having issues with it, we can get you on the next break because it's going to be very easy to get these next few steps, I think. All right, so now you can make your box plot. So you're gonna use ggplot again. But here, we're using the melted data frame. You're gonna use your treatment factor. Value is going to be your zero one. Fasted grid, so I'm saying marker here, but what I see here is, it didn't take marker as my, pardon, as my variable name. So when I ran that, I'm just gonna do it again. Df2 melt. So if you check it out here, if it doesn't say marker here, it just says variable, then the code I'm showing is not gonna work. Here, fast grid, tilde, you want this to be the column that is delineating which marker is shown. So here, it should be variable. If I was gonna do this again, not marker, okay? But we basically just make a Gion box plot. We're gonna fill it by our treatment factor. So one color for one treatment factor, one color for another one. And then fast grid, tilde. Here I say marker, but actually it should be variable. And then go ahead and plot this. And you'll see what it will give you, very similar to what we actually produced before in terms of our box plots, pardon me. Yes. Do you want them rotated in Gigi plot or do you want them rotated in base R, Hannah? Because you absolutely, anything can be rotated or moved. Yes, yeah. So that one's going to be a more complex finding. I suggest just Google it. I literally just Google copy and paste that every time I use it because it will have you rotate it a certain angle which you can choose. And then you adjust where it's located, like height wise as well. It may be in the code coming up, but I don't think so. So just Google exactly rotate X axis label Gigi plot. And if you find it, please add it to the Slack. I think everyone will love that. Also, if you have trouble finding it, let me know. But this should give you a box plot, very similar to the box plots we had before with the five pained box plot in base R, okay? Wonderful, it looks like four of you got it, very nice. Feel free to change the colors, maybe scale, fill manual on these, no more salmon and turquoise, all kinds of things you could change. And don't forget to click yes, once you've got that plot. I'm gonna make sure we're all on the same page before we move on. Naglet, check if your column name is marker. It might be variable. It is marker. Okay, it is marker. And then, and treatment factor is capitalized too. Yeah. Okay, try dev off first, and then try to plot it again. So remember that dev off, here I'll type it in. Just lost you, there we go. Dev off, dev off. Try running that and then plotting it again. Okay. And also see if maybe making your plot pane larger, but usually it will be a specific error if that's the case. I don't think it is necessary. Yeah, no worries at all. It looks like df new didn't get defined. Try df two, gorgeous Dahlia. That's right. And once you guys have it, don't forget to click yes. Aha, so ran, you made a beautiful plot and then what it looks like is, you're running scale fill manual without the rest of your ggplot call. So instead of adding it on, so a plus sign and then adding on the scale fill manual, you just run it on its own and then it's just outputting kind of a lot of stuff. So you'll want to just plus sign and then add it on that way. Excellent. Looks like we're getting there. I'm gonna wait for a few more people to get this. Excellent, great, glad to hear it ran. I'm gonna wait for a few more people to get this main plot and then we'll go from there. How would this be different if there were more than two treatment groups? Yeah, it would be more additional box plots. So instead of two colors here and two box plots next to each other, it would be say three box plots or four box plots and then still split by different markers. And again, if you guys are one of the 11 who have already got it sorted and working and looking great, try to customize your plot. Try to change things about it to make it look better because the default is not particularly nice. Mm, so ran, LAS2 is base R. So to rotate here, you'll have to use a GG plot. So go ahead and try to find it online. See how you would rotate the axes. Just Google rotate GG plot axes and you'll find the exact piece of code you can literally copy and paste it right in. I'll show you all after this. You'll even see my Google search that is how I sort it. Wonderful, Diego. I'm glad that worked. Yeah, dev off, man. It just solves all the graphics problems basically. Yeah, so it looks like you don't have a column that's treatment with a capital T. So go ahead and do a view on your data frame. So if you go to this, your DF2 and you click this guy, you can see your data frame columns and maybe you have a treatment factor or you can use treated. But there you want to make sure it's a column that exists in your data frame because that's what those errors are saying. And I'm gonna go ahead and run this one and see if I keep it with marker like I showed before, it's not happy with me because my computer or my run made it actually into just variable. So here, oops, no, I'm gonna do this again. Marker, DF2 melts, I'm gonna check it. It's still called variable, so fine. So here, variable, and this is what you guys should be seeing. All right, so similar to what we did before where it was filling in with multiple different box plots, here, GG plot has a really nice functionality called facet grid, and it allows you to automatically split your graph between different factors. So here, we have markers are different groups and we're splitting them. GG plot is automatically making this a factor variable and it's ordering it in the order that it's ordered in your data frame. If you don't like the order here, you need to make a new factor and you order the factor differently, okay? So I'm gonna repeat that. Let's say we actually don't want it ordered, marker one, two, three, four, five, et cetera. What I would do here is now I have DF2 melts, DF2 melts, and I'm gonna make it marker and DF2 melts. This is gonna be a factor of my variable column name, or my variable column. So here, the reason I'm using variable here is because I wanna make a factor variable out of this, okay? And I want it to be ordered differently than one, two, three, four, five, right? So here, I'm gonna say factor is this and then I want levels, because my levels are the same as my labels, I can actually leave, I don't have to specify labels here. So I'm gonna make this marker three is let's say what I want first, marker one, marker two, marker four, and marker five, okay? So now I've got a new factor, and so now when I make my box plot again, I'm gonna change this to marker factor, marker factor, and it should be reordering them. So now marker three is first. Often you'll find that the default is not the order that you actually want, and the way to fix it is factor. Use the factor function, make your own factor, and order those factors in the order you wanna see. Three, one, two, three, four, five, this is the exact order that I specified here. Three, one, two, three, four, five, and then I changed it to marker factor, okay? Alrighty, nice. So as before, I'm just gonna check the select. Nice, there you go, Ran. So Ran has added code for changing the axis rotation. So that's how you would rotate the axis text. We wanna update the color, legend and title. We wanna update the axis labels, and we have the same theme from before. So remember we set theme to classic. That's why it's a white background. If we hadn't done this, this would have had a gray background because that is the default in R. So you can update the theme if you'd like. So go ahead and update these axis labels, the colors and the legend title, the axis text, and even if you want to update how it's rotated, maybe change the rotation of these to be up and down, like get a 90 degree angle instead of flat, like they are horizontally placed. Go ahead and do that. And then I'm gonna clear all and just go ahead and click yes once you've done that. And for those of you who are trying to get this graph still, just feel free to just set up your paste in, your screen shots of your errors, anything you're seeing that's not working correctly, and we can keep helping you. That's right, Nagla, beautiful. And don't forget to click yes once you've got these all updated. All right, what do we see here? So yeah, here we go. Just make sure that you library in reshape. Yeah, I'll paste those lines in for you, Carol. I'm gonna post the master script later, Carol, but I'll put a few of the next lines in here. So there's the box plots and then the ggplot. This one.