 Alright, so I just want to introduce myself. I'm Lauren, I'm the instructor for this workshop. I've been working with our for, like I said in the slack coming up on a decade and I've been this is my second year teaching this workshop and I also TA for Boris type when he taught it before me so I'm really excited to be continuing on teaching a new generation of our users here. And I'm excited to get started so without further ado I think we're ready to go. And so one thing is this is under creative Commons license so if you attribute to us you can share and remix these slides and use them yourselves, and we encourage you to do so. So my goal for this workshop is that you all will be able to read any data, all of your data into our, and you'll know how to do so, you'll be able to inspect it within the our environment, using functions like structure dimension length view. You'll be able to conduct basic statistical analysis or even find the statistical analysis you'd like to do and do that. And, and more fearlessly debug your errors, you find in your R script so when you get an error message I don't want you to break out into a cold sweats quite so quickly. Maybe you, you'll be able to cool your nerves and find your way through more easily. And then create a modified publication quality our plots so I know as someone who I also use Python or, you know, basic like Linux command line other programming software, nothing. I don't think quite approaches ours plotting capabilities so I'm really excited to share that capability with you and I hope that you'll be able to use it yourself for publications. And that's great for you to seek out and use our packages so there's a huge diversity of specialties of people attending this workshop and even myself I've used are for many different things and there's a massive kind of trove of our packages out there that allow you to do so different things with your our environment and and to, you know, visualize your data analyze your data, all kinds of things so I'd like to kind of give you a peek into that world and make it so that you yourself can go find what will work best for your data and your analyses. And then you can go use those. I really wanted to pay homage to Ellen Vonder plus so years ago we created an our workshop that many of these slides are based on. And so I think it's important to say she's at University of Iowa now but she's also herself giving our workshops out there that are quite similar to these slides. I had this in the last one but I think, what do you guys think about doing this this year, Francis and Rashad of having just notes like that we all share here for the our workshop. Or should we just leave it to the slack. I think. So, is there a specific type of note that you were dating to see there. These are just, these are just a Google Docs so we could understand. Yeah, yeah. But let me. We'll put this on ice. Okay, let's put it on ice but let's think about it maybe at the first break. And so the issue. So one issue is that this Google Doc would be more permanent than the slack chatter. So slack chatter doesn't doesn't last forever and so forth so this might be a good idea. Okay, so we'll come back to you go on and then I'll put a link in the slack. Perfect. So sorry about that guys I will update this actually. And I think for this first section, we find without but we'll link to one. So we can always sharing notes. And we found in the past this actually really helpful to kind of just compile lots of user guided notes through this workshop, as we go so this link will be updated. So some tips for further reading, you know, the worldwide web are is open it's free and because of that, many, many people are contributing to essentially help guides all throughout the Internet, including workshops courses. So additional ones you can find online ones that are specialized to how you're actually using your data and analyzing it so I really encourage you to really just, you know, use Google to find your solutions. And most often you'll find if you're encountering an issue someone else has encountered a similar issue before and they published the result online so I really encourage you to take advantage of that. But if you're someone who wants a little bit more of an analog kind of user guide. This book is outstanding. I'm not paid by them or anything but Andy field is hilarious. I think he makes analysis really accessible by our in discovering statistics using our so this book I can't recommend it enough, really. So why are you know, why are we teaching this whole workshop on this one software. Well, number one, it's free and so it really makes science accessible from that perspective you can analyze your data and like I said, because of that there's such a massive user base, and base of contributors to our via packages that it's a super diverse and flexible environment for you to work in. Yes, and that's just repeating what I said whatever your problem is in all likelihood someone else has solved it and published their code. And basic our software can be expanded with packages, which makes the functionality really limitless. The R studio which you've been told to install as well. It provides a really user friendly graphical interface, which will get used to working within. And it makes our the base are much more easy to work with and organize all of your code and output within. Without further ado, go to the our environment. The first time you open up our studio you'll see something like this so when you've had a fresh installation. This is kind of what you're going to be seeing. So you'll have the console pain, and you'll know it's a console because it says it right here console. And this is where your code runs. So you'll see your code and then you'll see the output of that code, all in this pain here. There's also an environment pain here so it's got environment history connections. This tells you the variables you've defined and all the code you've run. Okay, that'll become more concrete as we go. And finally down here files plots packages and help. This shows your computer's files so in this files pain in this files tab here. The plots you generate, they'll come out on this window here, and you'll be able to export them to files copy and paste them from here, zoom into them if you'd like. And then also help. So you can actually get help documentation from this paint. So let's get started your first script somehow I thought I updated the date on this one and it regressed for me so that'll also be updated sorry folks. And so here, the first script, we're going to be working on is actually the script that you used to get started. So let's do your pre work so we're going to break that down and really dissect what you ran to get started for this so by the end of this lecture you'll be able to set your working directory, you'll be able to read a CSV that stands for comma separated value file into our, you'll be able to create a grouping variable like a factor in our, and you'll be able to create and modify publication quality are plots. Okay, so starting your first script here you're going to go open file. Intro to our pre work, all of you will have done this to do your pre work, and then you'll have this new pain open up right here. Okay, and so this is the pre work script that you guys were sent to work on. And this is in the script pain. So this is where you write your code. Okay, and essentially this pain is actually just holding a text file here so this script. These dot our files are actually they're just text and what the purpose for having the text is to really just document all the code you're writing so that you can run it down in the console pain. So you really just want to keep a record and that's what the text here does. So this is your script. All right, so once everyone has this all set up you've got our studio open you've got your script open here. Go ahead and click yes, and we'll move forward once everybody's clicked yes, and click no if you're having issues getting to that point to. All right, we got a lot of people. So for the person who clicked no, do you mind opening your mic or maybe speaking up on the slack sorry I'm actually going to get to my slack window here. I can. Sorry. Yeah, yeah, I just, I just have the because I ran it yesterday. So today, and I have the previous version already on the on that screen and where I deleted the set W. Okay, that's okay. Okay, that's fine. But otherwise you have a script here. Yeah, great. So you're good to go. Good deal. Perfect. All right one sec now I have to get my. There we go. Okay, I think we're good to go. Wonderful. All right, so you've got your script open. We're going to go line by line through the script now. All right. So, first I'm going to dash through all the lines and then we're going to break them down. So the first line is, these are the packages we're using in the workshop there's going to be a few more that you'll install down the line. But you'll be so familiar with packages at that point that you'll be able to do it on your own no problem. The next line is setting your working directory. So it's good to save this actually as a line in your code because it's directing where the data sits and where you're writing your data to so we will get that set up for everyone here. The next line is reading in your data so notice that there's no file path beyond just the name of the data file. If you wrote the entire path to the data. Then you actually don't need to set your working directory, but unless you are in kind of a directory like your base directory. This is going to look in your working directory for this file are is going to look in your working directory here we're encoding a grouping variable. We're now using the ggplot package that we installed here. We're actually going to use it for a graph. And here we're plotting a box plot of biomarker versus exposure group using ggplot. So using a function from the ggplot package. Okay. So now, let's get started packages will use in this workshop. So installing packages packages. They're just collections of our functions. And packages can be created by anyone, and then they're submitted to cram. So the central kind of our group and in your environment history pain you can actually click on packages and you can see, these are all installed packages on your machine. So what happens when you install the packages, it comes from the internet and it's loaded onto your computer so it's saved in a file on your computer locally, and that's it that's all that means. And so it just means that those functions are accessible so to install packages you can use the install dot packages function, which is what is written in your script. You always want to have quotes around the name of the package when you're installing it. So you only need to install a package once, but you need to reference it each time you use it. So that's why we have this library ggplot to. So installing a package takes it from the internet onto your computer, but library takes it from the computer into your our environment similar to kind of reading and data. This brings the functions from that package into your local our environment for you to use. Okay, so you don't have to programmatically install packages and when I say that I mean, you don't have to use install dot packages to actually install them. You can also just use this install button on this pain here so when you have your packages. And then it will open up a pain that allows you to just type in the name of the package, and you'll notice it'll be auto completing those packages as well so I'm just going to go to my our window here and do a quick demo of this. When you have lots and lots of scripts you also have them here in little tabs. Alright, but I'm here on this tab. And if I go to my packages I can change the size of these different pains by just clicking on this middle here and when this comes up. I can go to packages, and I can install. And then here gg plot to it will give me options for what I can install as well so that's another option. And you know you don't really need to be saving it like we, like I said before, you know this script is a log of what you've done because you only install packages one time, you don't actually need to have it saved in your script, because you don't need to run it every single time. So that's why even doing a point and click version of installing your packages should be totally fine. Okay. So here, we see we installed packages gg plot to gg plot to should be in this packages pain. And, yeah, and it's a way to just confirm that it's there, but nonetheless, you'll still need to library it in to use it. Okay. Next, we want to set a path to our data. So we do that using set working directory here set WD. So, there's two options to go about this one is programmatically doing it. As I'm showing here. So set WD here this is common in Microsoft, like a desktop computer here, you'll have Yeah. Sorry. Make sure. Yeah, there we go. So, often on a Windows or PC, you'll have a drive that you're specifying. But if you're on Linux or Mac, you'll often start with the tilde to be setting your path. Okay. So here, you can type your path into your script. And if you start to write the path, you can press tab to actually show the available files and folders to be setting your path. I'm going to show you here. This is not the right path on this computer. So what I can do here is I can say okay I know I'm in the C drive. I can press tab here, and I'm at users. So there I just typed you, and it updated my options here. So I'm going to click you. Okay. And then I'm going to push tab again so whoops, make sure my cursor is here. I'm going to go to that forward slash press tab again. And I want to go to Lauren Erdman. I want to press tab. I know it's in teaching. Oops, no, I'm on desktop here desktop teaching. And then CBW. And then intro to our, let's see. Okay. And then I would do command or control enter here. Okay, so that's skipping ahead a little bit. So once you have your path written, if you're on a Windows or Linux machine, you'll have control enter. But if you have a command button, you'll be command enter to run that line of code. Okay. So I just want to show you guys again here, I just have my cursor anywhere on this line. And what I did was control enter. And you'll see it ran down here. I just want to see there's a. Okay, there we go. Control enter one second. So you'll see down here, your code ran. Sorry guys I'm just making sure I'm caught up on the slack. Okay. Now there's another option actually for setting your working directory. So if you don't want to go through searching tab tab tab and it's easier to do it from a file explorer, kind of perspective, then you can do this you can do a kind of point and so you go session, set working directory, choose directory. That's going to open up a file explorer. And then here you see I'm choosing this workshops 2020 directory. And once I select that directory. Again, it runs it down here so actually that point and click version. It's still running that line of code. Okay. So let's go back here. So I actually have lots of files in this folder, but it's not showing any files, because a working directory is a folder. So you want to look for a folder and once you're in that folder, then you say open. And then that sets your working directory. Okay, so once you do that, it will run it as I said before, it'll show up down here. So I'm going to show up in your history tab, because that means that line of code ran. So you also will have that line of code documented in history of your R R studio. And so what you can do is you can actually insert this if you click on this line of code, and you actually have selected this already so first, I think the order of operations is select here. So click here on your history tab. And then if you click to source here, it will actually update it over in your source. And so this is another way to save your working directory in your script, without having to manually have typed that whole directory out. And so once that's there. And that has run, you can also see it down here and this is where you know your working directory is this is how you know what ours pointing to is right above your console paying here. Okay. So once you set your working directory, and you have the directory that you want to be. So it's, it's got the example data one in it. Go ahead and click yes, once you have this working directory set in any way you'd like point and click auto complete however you'd like to run it. Once it has run, and you have the correct directory in your script. Click yes. Sorry guys, and go ahead and click no to if you're having issues or open up your mic is no problem. We're gonna a few more. Okay. All right, I'm going to go ahead and move on. Okay. Looks like we're good. So now you set your working directory is very important step. So, you know, you can import your data into the our environment, because you're pointing to that data now from your our environment, you can just run this line of code. So I'm going to go back to this. But if you want to, you can just have your cursor anywhere on this line. And again, control enter, it will read that data into our so let's talk about this data. And R can read in all kinds of data, a data type so it can read comma separated values text files excel labor office. It can read in any text file actually that if you have any like genomic data or anything saved in a text file, even if the the text file doesn't say dot txt at the end of it if it's dot BIM dot BAM dot Sam, etc, like, it will read it in just fine. And if there are like Excel files or anything, sometimes it just needs an extra package to read it in. So one very our friendly format is a comma separated value dot CSV. It would look in Excel. So if you open a CSV file in Excel, it's a nice kind of like spreadsheet format very simple to read, but how it looks in notepad and how it's encoded on the computer is by commas. And so that's why it's called a comma separated value we have on our first line. These are our header. This is our header line here. So, as you can see, my first column is sample ID I've named it sample ID. My next column is exposure. I've named it exposure. And then I have biomarker values here in this third column. Okay, and these are all separated by commas with a new line. So, kind of like enter representing that a new line is created. Okay. So here, we have our name of our file in quotes. We have our read dot CSV is our function that's actually helping us to read in this file. And we set header equals true there because we have headers on each of those columns. So if we said header equals false, then our will say okay great the entire file is data. I'm not going to say that this first this first row here is a is a title to each column so won't treat it like that. So we have header equals true, of course, because we have headers, and then as is equals true makes it so that any text. So anything that is a character string, not a number, then that is going to be treated just as a character string. It's not. Just a second. So just make sure you mute yourself. Let me see if I can turn off people. Great. Okay. One sec. Just a brief intro. So here I just dropped a lot of jargon, you know functions options here arguments. We're going to jump into these now so commands functions objects in our so commands. It's what you want our to do so it's a full line you're commanding our to do a thing. And here we had a command we said set working directory. That was telling our to do a thing it requires a function here set WD is our function. And our argument to it is the working directory path that we set. They can also define objects. So we can have an object being made from a function working with some arguments here. So an object is anything you create an R, and you can see what objects exist in your our environment here so I have run lots of things and that's why my environment is filled with so many objects so these are all kinds of objects I've created in our okay. So yours should be relatively empty, unless you run some of these lines in which case maybe you have DF here, for example. So these are objects you created them in our, it's a, it could be a single value a matrix a list of values a statistical model it could be a function. It could be all kinds of things but you just created them in your our environment. So the function is what you use to do in our to create your objects and your output. So functions they're transforming things, some objects to other objects or plots, for example. So here this is a silly example but let's say we want to create an object called duck tails okay this is our variable. And I'm going to make that using this equal sign, I can also use an arrow, which I'll show you later. So here, I'm going to say, I'm going to make it by concatenating Uncle Scrooge in quotes so this character string here, Huey, do we and Louie. Okay, so that's our object we're creating, we use the function concatenate, and we open the function here, we close it here so functions like to have open and closing parentheses. And then these are our list of arguments that we send to this concatenate function so we're saying concatenate these all in a vector. Okay, and so here we create a vector that's duck tails our object, and it's a vector of these names. Now concatenate something additionally on to duck tails. We can say, let's create a new object we call it duck tails expand. And here we concatenate duck tails with Donald duck and it creates a new object that is an expanded vector. Okay, so note, there's no spaces between the function name and the parentheses, you're opening here. All right, so you write the name of your function, like here set WD open parentheses here see open parentheses, you'll see over here, we have read dot CSV. So parentheses, you don't want to space here, but you also want to close your parentheses here. What's nice is our studio is actually highlighting the parentheses that corresponds to the one that your cursor is on. So you'll know if you've closed your parentheses, but our won't consider you finished with your functional command until it sees the end parentheses there. So it's very important that you're closing up all of those commands that the very common error. Okay. So, first, I just want to quickly review what I just said. So let's say, because this is a common error. Let's say you didn't have this here. So, here, a lot of x's are going to be created because it's looking for that close parentheses. Also, if you run this line, you'll be getting this plus sign down here. Okay. It's a little bit hard to see. And so, see here I ran it I don't have this close parentheses. It's looking for that close parentheses somewhere. So there's two things you can do when you start to get these plus signs. Notice, I'm like trying to get out of it enter enter enter. It's still there the plus signs still there are really wants you to close up that statement. Two options I can do to get rid of this one. I know that it's missing a parentheses here. So I could just do shift and close that parentheses, and then run it. Great. It ran the line it's happy with it. It got the data in, and it's correct. Okay. So that's one option. Okay, so just to reiterate I ran this line. I have lots of space and then I just closed up my statement and it didn't give me a plus sign anymore so it gave me this carrot here. It tells me I'm good to go I can keep running code. Another option though if you find yourself in this situation. So I'm going to do this again I'm going to get myself in the same situation I've got this plus sign. Okay, I'm like now I'm trying to run dim DF. Oh, wait, what no, no, here we go. Sorry, it didn't give me the error I expected. Um, dim. It's giving me errors. So here it's arrowing, erring out and it's letting me start a new line. Sometimes it won't let you do that sometimes you'll run it and you'll keep trying to get out of it. The way to get out of it is just use your escape key at this point. So just escape. That will also get you out of it. So either create an error for yourself escape, or close your statement with your parentheses and just run the line accurately. Okay, those are the options to get out of this situation. All right, and certainly you'll find yourself in that situation again, I do myself, but just so you guys know. All right. So now, let's go more deeply into these vectors. We're going to see data frames and lists in our okay. So vectors, they're one dimensional objects. So like you saw before, we had Huey Dewey and Louie and Donald duck we had them all concatenated together. That's one dimension. So it's just a long line of characters that we've strung together. So here I'm making another one, my back, and I make it using concatenate and I say cat dogfish hamster parrot. Okay. If I want to access. So I've created this and I want to access the second element. Then I say my back to the result is dog that should end quotes there. Okay. So here, let's just, I'm going to make a vector here my back will see a comma B, comma C. Notice, I'm putting spaces here. I don't have to these spaces are not essential. Okay. But the space after this function is okay. I run this I've created a new object here. So my back, it will show up here my back is in values. And when I say here, I want to access the second element here the second element of this is be so I could do my back. And I can do square brackets to. And B. Right. So similarly to hear where I'm accessing the second element I'm getting dog, I could have access the first element by changing that two to a one. And I get a. Okay, so vectors are one dimensional, and you can access each of the elements using the square brackets, and then putting the position of the thing you'd like to access. Okay. Data frames, they're two dimensional. And if you think about the data that we're reading in here, it's a data frame we're bringing it in as a data frame. So a data frame, it will have different columns. They're all the same number of rows right so it's it's all like a spreadsheet essentially and this is what your data usually is formatted like. So here you can also you can read it in like we just did we create a data frame when we use read CSV. Okay. So here DF, this is a data frame that we've created. But we can also create a data frame using the data frame function. Okay, so here I use the data frame function. I'm saying my header. My first header is header name one, and I assign that header name one to have my back one as that column. So I'm going to have a second column, I'm going to call it header name to so that's my second headers name, and I'm going to have my back to be the second column. So here's my back to I've defined it up there and I'm using it down here, my back one, I define it up here I'm using it down here. What this is going to create is essentially like what we have in our data right now is a data frame that has two headers header name one header name and then it has the first vector as the column contents here and the second vector as the second columns contents. Okay. So the column names though are also a vector. So you could use call names or names to access the column names. So here, we've read in DF, we could say names, DF, and we access the names, these are the column names. Okay. So DF, if we want to just see it, you can just type it in and see it down here notice I'm in the console pane I'm just typing them in and pushing enter. And we can see this is also a data frame, but instead of here where our header name one and header name to our header names. Here our header names are this weird unlock sample ID exposure and biomarker value so these are header names. And then these are the vectors the ABCDF GF are are the sample ID names exposure names biomarker value names these are vectors that are our columns in our data frame. Is there a question. I'm truly asked any questions if you're if you're encountering them, I'm happy to answer them on. If you have your mic or on the slack is just fine. Okay, to access column one, you can use the dollar sign so my DF header name one will give me the contents of my back one here so this first column. So here, going back here, DF notice on my machine at least it's giving me auto complete here so it's allowing me to choose from among these. The first thing you can do though is you can press tab to get this same. So here, if you get dollar sign tab to actually see the options as well. And if you want to use your mouse you can actually click on it to choose them, but you can also use your arrow key to arrow between them, and just push enter when you're on the one you want. And then I want to print that so now I have a vector. So now I just access that vector. That's my third column here. Okay, another option though is to use square brackets, similar to what we did with the vector, but because data frame has two dimensions, you have a comma to specify what dimension are you slicing, because you could be either slicing the row dimension, or you could be slicing the column dimension here. So here I want to access my first column. So I access it on the right hand side of the comma. I say one, and that's the first column. If I wanted to access the first row, I would type one, and then I would have a comma with empty space. When there's an empty space on one side of the comma, it means select everything. Okay, so here, this is all rows, because it's empty before the comma, and just column one here. Let me see. If you want to specify your rows and your columns. Then you could do that in two ways as well. So you could make it a vector first. So here I make it a vector by accessing that first column. And then I say I want the third element of that first column, right. So row three of column one is the third element of the vector of column one. That's how I'm accessing it here. But I can also treat it as a two dimensional object and say, give me the third row of the first column and slice it from that perspective. So I'm just going to show you guys that over here. If I want biomarker value so I want the third column, but I want the fifth row, then I could do this, I could say I want the fifth element of this biomarker value vector. And here I've just put it in square brackets, and then push enter so that's 56 and you can tell because this, these are the row numbers here. So 56 is definitely the correct row number here. All right. So that's one way to do it, but I could also say DF. So let's say I want the fifth row. So, first is rose, then I have a comma, the space could be there or not it doesn't matter. And then I want the third column so I could write three here. And it's going to give me the same value, the fifth row, and the third column, and it's going to give me that value. Okay, you can also put vectors into these positions. Okay, so let's say we want, we want the second, third, and fifth row. Okay, so now we use that concatenate function, we created a vector before. I'm going to create a vector here on the fly. 235. So this is a vector in here. And this vector is an argument to the rose. So I want the second third and fifth row. And I want the only the third column. All right. So I can do that. And then I'm getting the second, third, oops, sorry, second, third, and fifth rows of the third column. All right. So you can slice many different dimensions of your data frame using these techniques. Okay, we're going to go through this a bit more later. So don't worry if it's like, not totally solid yet. It takes some time but when you once you've got it you really will have it. Okay. All right. Now lists lists are amazing. I honestly I discounted them a lot when I started are but now I use them often and once you get comfortable with them they're very, very powerful for using. And the list it's kind of just bulk storage. Data frame is a very special version of a list because you have vectors, it's combining vectors that are all the same dimension. So it could be thought of as that but don't worry about it lists are much more flexible than that. So here I'm creating a list using the list function. The list has two objects in it. I say object one is going to be the name for my first object. This is totally arbitrary again I've chosen this, and I'm going to make it my vector. And object to is going to be the data frame that I created before. Okay, so these are the the vector that I created and when I created that first vector and object to is that data frame I created before. It's all stored in a list and so see it's just arbitrary vectors have to have the same type of data element continued. Data frames have to have columns that are the same length, but a list can have whatever so a list is just binding together lots of arbitrary stuff. So if you want to access things within a list, you need to use the double brackets or dollar sign. So if I want to access column two. So this one in object to here. I'm going to use my list dollar sign object to so this gets me into object to. And then once I have that then I use the same technique I used on my data frame so then I say dollar sign header name to. So I access object to first, and then I access the second column of object to. All right, a second way you can do that is using these double brackets. So where we use the single bracket for our vectors and our data frames for lists you need to use double brackets to get access to the object in the list. Okay, so here my list, I go in and I get object to now this whole thing is a data frame from which I can use my dollar sign to access that second column. So I'm going to do a little thing here. I'm going to say my list equals, and I'm going to say list, that my back is going to be my first list element here, and I'm just going to make this back be. A XYZ. Okay, notice, I'm using different quotes, it's actually okay to have double or single quotes here, your list should be far your vector should be fine with that. Okay, and then I'm going to say my DF is going to be DF that I had before. Okay, so here notice I'm creating this vector on the fly in here, and then this one, I'm actually assigning it by using an object I've already created my environment both are fine to do. Okay. So now I'm creating a list my list. Once a list is created, it will be added to your environment so I know that I have one at the list of two elements. Okay, my list. I want to access the names in it if I'm not sure what's named I can say names, my list, and I see my list names. Okay, and then, similar to what I've done here. Okay, I want to access biomarker value this biomarker value column. I know my DF is in the list, so I can say my list, I can use the dollar sign, my DF. So here I'm just arrowing down to it, but I could also start to type it, and it will go to the top there I can just tab to auto complete it or I could have pushed enter there. If I just do this one. I'm just going to get that whole data frame out. Okay. So now I want to get that third column. So I just add another dollar sign there and boom. Third column. All right. So you're just iteratively accessing these elements. All right, so another way we could have done that is my list. Double brackets. I use the name for the element by DF here, my DF. And then I can say dollar sign. Again, it's giving me the options. Yours may or may not do this, but you can often use tab, and then I can dollar sign, get that column. And then let's say that I want to access the third or let's say the fifth element like I was before. I can, again, I can use those square brackets just add it on to the end. And now I'm accessing that exact element. So I'm getting into my list. I'm accessing my data frame. I go to the biomarker value column, and I get the fifth element. All right. So again, this is a lot right now, but I think these things will solidify as we try them again and again and again. All right, so just to introduce you to this. All right. So I'm going to recap the vector vectors they're one dimensional data frames two dimensional. So the vector you use square brackets no comma data frames square brackets. Data comma rows are before the comma column after the comma less one dimensional, but they're bulk. And the reason I say one dimensional is, you can also instead of here. Instead of writing the name of it, I could also said to because it's the second element of my list. So that is also giving me the, the whole data frame. Okay, so in those square brackets, I can also just say which number, but they're going to be only a single sequence of numbers there's no two dimensions here. Once you get into the data frame, then you can be accessing things like a two dimensional object. So here, again, I can say, I want the second third and fifth row. The third column, and I'm doing this on the second element of my list, which is a data frame, then I can do that. Okay, so now I'm doing a two dimensional extraction here. Okay, so lists, they're one dimensional bulk storage. And it just can be anything all bound together. All right. So here, we're importing our data as a data frame. So once you do that read CSV function, it's going to bring your data into your environment and it's a data frame it's a two dimensional object that you're now using with your data. All right. So, you've named your data frame DF. That's why it creates that DF object in your, in your environment pain. The file name containing the data is here. We have header equals true because we have headers as is equals true. It means that don't convert the variables into a grouping variable just treat it as a string. And of course we have our object, our function, and then these are all arguments to our function. So the first argument is the data name or the file name. The second argument is header equals true and as is equals true is our third argument here. And you can press tab again I'm going to emphasize this again and again because if there's any spelling error you're going to get error message in your environment or in your console. Pressing tab helps you avoid those kinds of spelling errors. All right. So you place your cursor on line six control enter. You should have DF show up here it should be eight observations three variables, you should also see it run down there. And once you guys have DF in your environment, go ahead and click yes. And notice if you're working directories not set correctly, it's, it's going to give you an error too. So that's a way to know if you've actually set it to the place you want. All right, it looks like we're close. Forget to click yes, very nice. All right. Good deal. Okay, perfect. And it looks like we have a lot of people on a second machine as well. All right. And if there's any issues, remember to click no or to message. On the slack or message TAs to have a breakout room set. Okay, I'm going to say, that's okay. Yes, quicks. Okay, so I'm going to go ahead and clear all. And we'll keep moving on. And we're going to take a break here soon. I'd like to have a break about every hour. Just so we're all staying fresh. So let's just move on to data frames. So we've talked about this already data frames they're fundamental data structure using R. It's just a collection of variables with the same number of rows and unique road names so you cannot have the same row name. As I showed you here, usually the road names are just numbers. So that's what they are here. So that keeps them unique here. And then once you've had it read in, it'll go to your environment tab and it's a great way to verify that you've read it in. You can also click this little guy here. And it will give you a kind of view of it. So here I'm going to go ahead and click it. Here we go. It opens it up in a kind of spreadsheet view. This is a nice way to just have a look at your data in case like you're not sure what you're looking at or what's read in and just making sure it's correct. Okay. All right. So with that, I'm going to go ahead and have a break right now for five minutes. And then when we come back, we'll start talking about grouping variables. Okay, so everyone can just refresh their water, you know, go. Okay, so now we're on our next line of code here. We've read our data into our environment. We've got it in our environment pain, good to go. Two lines of code really need to have been run, but you could have run many, many more and that's fine too. We need our working directory set in our data frame read in or our data read in and it's saved in a data frame. From here, we're going to create a grouping variable. And so I've been talking about these grouping variables. This as is equals true is alludes to them. So as is equals true means are don't create my grouping variable for me. I want to create it for myself, but very oftentimes in research we have a variable that is categorical. Here we have cases and controls. And we've encoded them though numerically. So even though they're a category, we don't want it to be interpreted as a number like we have it shown here. So we need to tell our that it's categorical and not a number. So first though, I'm going to go a deep dive into the dollar sign, which we've seen quite a bit of. So to access certain variables, we use the dollar sign here. We're accessing exposure, like we did previously talking about data frames and also lists, but you can also create a new variable using this dollar sign. So here you can access exposure, but you can also create exposure group here using this dollar sign. So type DF exposure to access exposure. You can also highlight a snippet of your code and run it to just see that part. So here I'm going to actually go to my R script here. What I'm saying is I can just highlight this piece of code, and then I can go control enter, and it will just run that individual piece of code. So where before I was showing you can run the whole line you have your cursor anywhere on this line at the end of the line. At the beginning of the line. Pardon. At the beginning of the line. You can even highlight the whole line and run it, but you don't need to. So here, I only want this part of my line of code run. So I just highlight it. And it only runs what is selected. Okay. So I'm going to run this here to clean up my code. I actually pushed enter, and I created multiple lines. And so ours okay with that you can actually break up your code across multiple lines here. But it's best in my opinion to do it after a comma, because our will align it here for you. So continuing to write in your function, you haven't closed your parentheses yet right so we have an open parentheses here. You haven't closed it yet so it'll line you up to keep adding in things down there. All right. So anyway, all that is to say I'm going to bring that back. You can highlight this and run. So, once you guys have highlighted and run DF exposure go ahead and click yes. So I just want to see it you just want to see that one vector, just a vector of ones and zeros. Okay. We're close looks like a few more. All right. So from here, vectors, we already talked about their one dimensional objects. Here, ours is eight, eight zeros and ones all concatenated together. The length of your vector here is very easy for us to just count. But suppose you have hundreds of numbers. Maybe you want to know how many there are, you can use the length function. And so he's length around DF dollar sign exposure. Then you'll see there's eight characters or elements of this vector here. Okay. We want to create a grouping variable though, out of that zeros and ones vector. All right. So a factor. It's a, it's a term used in our for grouping variables or categorical variables. All right. So to do categorical analysis, you have to define factors. Otherwise, if our sees any numbers, it's going to interpret it as a numerical value, and it's going to treat it that way it will not treat anything else as a category. Okay. So in the example below we want our exposure group. So our new variable to have two levels case in control, and we want it to correspond to the zeros and ones that we see in that exposure variable. So the first level is the baseline in all analyses. In our case, it's not such a big deal, which is the baseline, though in general you want control to be baseline right and you want to measure the effect of being a case, or you may have non exposed group you want to measure the effect of the exposure. And so in that case you kind of want your baseline to be no exposure or control, and your next level that you compare to it to be exposure or case. It becomes much more complex if you have multiple groups. So more than two so let's say you have multiple cities. Then you need to choose a city, one of your cities to be your baseline group, and all the other cities will be compared to that one. So, setting the first one is the most important. And that's done is in order. Okay, so here, if you run the line of code that is shown. You will get this output from your DF exposure group here. But here, we're setting the levels see zero one, right, because we got zeros and ones here. So we're setting zero to be the first group and one to be the second group so zero is the baseline. It's the, it's the group that everything else will be compared to and in this case we only have one other group so everything else is just cases all right. And we're labeling labeling them control and case, and these labels are corresponding to the order of the zero one here. Okay, so when you run this line. You can see this full line here run no output. But again, if you select just this piece, and you say I just want to see DF exposure group I want to see this output, then you can see this levels control case. So down here, this is telling you that control is the baseline level and case will be compared to it. And then this is showing your your actual data so control control control case case case case, which is what we expect to see because we had four zeros followed by four ones, and all of those zeros should be controls, as I shown here, these are all controls, and all the ones should be cases. Okay. So the zeros consist of ones and zeros or the variable group consist of ones and zeros. We want to tell our this is a factor, we use the factor function here. And we're going to use it there dollar sign, we're adding a new variable, it called exposure group to DF. All right. So we're adding our new variable from the existing group variable DF exposure. Alright, so it says, Okay, I'm going to create a new factor. I want to add it as a variable to DF the I want to use this variable to create this factor so I'm going to reference this variable to create this factor. The levels are going to be zeros and ones, and zero is my baseline, because I'm setting it first here in order. I want to label these variables as control and case. And so here we match the order zero one and control case. Alright, and so all the zeros are called controls and all the ones are called cases, and zero and control is my baseline group. So once you guys have created this. There's a few ways you can check it, you can either do what I did here where you highlight it and you are able to print the output. You could also go ahead and look here you have four variables that's good. So a variable will have been created you could go to view, and you can see you've got a new variable here. And that's another way to check it. You can also just print your whole DF so you could say DF here. And then that will also show it. Okay, so there's multiple different ways to check. But once you've done it, go ahead and click yes, and make sure that we all have a nice factor added to that data frame. You can actually use the indexes that I showed you before to you could select the fourth column of your data frame to see it. Awesome. Looks like we're almost there. You can click no to if you're if you're encountering issues, even if you're the only one to encounter this issue this time. Often the same bugs will keep coming up for people great who's got to know. Here, you can go ahead and you can open your mic and talk to the group or you can go ahead and slack. I'll get on the slack channel. Hey, sorry. So I use you mentioned we should see something at the environment bracket and I don't see anything there. So in environment you should have DF. I have the DF. Okay, and are there four variables. Yes. Okay, perfect. Great. And then you know nothing new. Okay. Yeah, so it should be good to go. That's great though. Perfect. All right. Oh, got that. Okay. Thank you. Perfect. No problem. I'm just highlighting everything here. Okay. All right. Sorry to interrupt as a question in slack. Do not know how to see it in the environment. Okay. So the environment is just the pain here. Sometimes it's down below so sometimes these two are swapped. Now they're laid out. So you should see DF in your environment. And if you don't see DF, then that means that you haven't read in your data. So then, if there's no DF here, you'll want to go back to this line, have your cursor on it, control enter to make sure that you read it in. And then you run this line next control enter. So see how once I rerun DF, I actually overwrite the change I made. So I ran DF here. I'll do it again. So just cursor on that line control enter. Now when I look at my environment pain, I've got three variables. Okay, so it with it reverted back to the original one that I read in. And so now I would need to run this new line again to get that fourth variable. So you can control enter and you see a fourth variable is added on there. All right. Wonderful. All right. Good deal. Okay. Nicely done everyone. So we'll go ahead and clear all and move right along we've got a factor now. And your output should look like this we've already checked our output. And so, like I said before, you can highlight a snippet of code you'd like to run. Or if you want to see your whole output, you can actually surround your whole command with parentheses, and it will print the whole output at the same time as running the code. So just to show you guys here more concretely, let's say I wanted to see this output. So what this is here, I want to see this at the same time as I'm creating the variable. So I'm going to start from the top I'm rereading in my data. Back from three. And I want to see the output when I created I put a parentheses here and at the beginning. See how when I have my parentheses here highlights this one, this entire command is now surrounded by parentheses. If I run this, it prints this output, but it also added that fourth variable at the same time. So it will do both at the same time for you. Okay, so that's just another thing you can do if you really want to see the output when you run the command, you just put parentheses around the whole thing. Okay. But here, another way you could check it is even to go and just click this view here. Notice, whenever I'm clicking that actually I'll put it here. Whenever I'm clicking this. So here I'm clicking it I can see this. It runs this command. So you can also go and see your data programmatically by using the view command so here I'll get rid of this and I'll just do the entire thing programmatically. It's a capital V IEW DF. And then it'll open that for me. So similar to when we set our working directory are is using your clicks to run a command that you could also just run that command on your own, if you'd like. So we've created our factor. Now we're going to do ggplot. So what now we want to make a plot using the ggplot to package. Okay. So, everyone go ahead and run this library command I just want to make sure it's in so it will have run here. Sometimes it will say warning it's made in this made from version or something. But once that's run go ahead and click yes, and I want to just make sure we're all on the same page with ggplot ready to go so library ggplot there. You can also look in your packages, and it will have a check mark now, having been libraryed in so this is being on this list mean means it's an installed package having the check mark means that it's libraryed in. So now all of those functions in ggplot to they're accessible to me in the our environment. Okay, so about half of us have it. Remember to click yes once you've just run that ggplot. It's just library ggplot to great. Good good. Okay. Okay, I think we're ready to go. Okay. All right so ggplot to is awesome. It is super powerful plotting tool so we've gone over this before to use ggplot to you need to have installed the package and library those functions into your our environment. This is one cheat sheet you can use for ggplot to there are so many there are lots of like very simple tutorials on how to use it available online so I really encourage you to use it. I don't think you need to memorize a lot about ggplot to so you don't need to memorize every little function and change. I often am googling myself. Just, you know, how could I make a violent plot with ggplot to a heat map with ggplot to etc. It is very easy to find these solutions online. But I what I want to impress upon you guys today is kind of the overall syntax and logic of ggplot to so that you know how to add to it and debug it and kind of get your head around creating just even the basic plot. Okay, so ggplot to plots, they're made up of layers. Okay, and these layers, they're put on top of each other. And so if you ever work with like a vector graphics kind of program. Like, I believe publisher will let you do this but I use inkscape myself. If I export a ggplot object, I'm able to take those layers apart so they are literal layers on being plot on top of each other. Okay, so here you can see the first layer of the base layer is the plot pane the background. The next layer is the bars, and then I put error bars on top of my bars and it results in a layered plot that shows my plot with my bar plot and then error bars on the values themselves. Okay. And each layer it contains a visual object like I said so here, we've got the background like the pain. A bar is another visual object error bars are more visual objects. And these are basically called gms in ggplot to and they're added using the plus sign so it's a kind of different syntax than a lot of our programming. Like I said you know bars error bars text you can just add them on top. And you have to tell our what gm you want displayed when you define the layer. And gms have aesthetic properties. So you can make them either. Maybe you want all your error bars to be the color red, you can also make them variable so you could say, I want all my control data to be colored red, and I want all my case data to be colored blue in a scatter plot, for example, so gms are flexible to these kinds of changes to their appearance. And these aesthetic properties they're referred to as a yes so a yes is a function that you'll use within ggplot to give gms those variable kind of aesthetics. Okay, so I'm going to list out a lot of gms. No need to memorize these I think a handful will do like gm point for example. But here so gm bar creates a layer with bars. So like a bar plot okay gm point a layer showing data points so like a scatter plot gm line, it'll give you a line plot. It's smooth. It actually gives you a line that summarizes your data. So it actually does some calculations and will summarize your data gm histogram you've got a histogram, you can add text. You can add, you can have a density plot so kind of like a histogram but smoothed to show the distribution, you can add error bars with gm error bar. You can have line or line or line or line or line or line or line or I believe also you can add lines, you know, so maybe you want to reference line on your plot. That's how you can add that so suffice to say this is a short list of a very very long list of gms, they can be added to your ggplot. memorize them, but this is just to illustrate kind of the logic of how they're named and how you might even guess the name of the one that you would need. So the basic command structure with the ggplot is you start with ggplot. So you actually use the ggplot function and this defines just the overall graph. So the data is the first thing that you're going to be referencing in this. So you say ggplot and then my data frame, whatever your data frame is here, and then you go to use the AES function to define the aesthetic properties of your plot. And so in general, this is going to give your x-axis variable, your y-axis variable, and your color. Sometimes you'll have a plot that doesn't have two axes, in which case you wouldn't have both axes here. You could just have x or y, for example. And then here, I'm saying color equals sex because I want the color of my whatever geom to be variable based on the sex column. So this here is saying there is some column in my data frame that is called sex that I want color to be variable based on. One second, I just noticed a bug in my code here. I would have gotten an error if I use this because there was no closed parentheses. So see here, I have an open parentheses for my AES and I want to close it here. And then you'll need to also close your ggplot parentheses here. So this is one complete command that's run here. Okay, so I just wanted to correct that and save it before. Good. Okay. So that's our function here. It's called ggplot, of course. We're referencing the data frame. You define your aesthetics. As I said, this is the foundation of your ggplot. This line is not going to produce anything on its own. This is actually just telling ggplot. Here's the baseline of the plot I'm creating. Okay. And then here is where you add layers to this kind of foundation that you've created. And like I said, you add them using the plus sign. So literally, you write your ggplot function here out your ggplot command, you add your geom, and then you have your additional geom specified here. So here's an example from your own script. You've got ggplot. You call the ggplot function. You reference your data frame. That's df. And here we're defining the aesthetics. We've got our x variable is our exposure group. So on the x axis, we want exposure group. And this is the factor variable we described or we defined before. And our y variable is our biomarker values. So we want to have our biomarker value be shown on our y axis. And we're going to add a geom. And the geom we're going to add is geom boxplot. And we add that using this plus sign. Okay. Now you're going to add layers to your foundation here. So geom boxplot is the next function, the geom here. And here we're defining an AES where we say we want the fill. So the color filling up our boxplot to be exposure group. Okay. We actually could have put this up here in the base of our plot. And it actually wouldn't make a big difference. The main difference is if you put fill equals exposure group in your AES here, then it's going to apply to every geom you create. So every fill is going to be there. But because we only have one geom, it's the same. Okay. So fill could be inside just to repeat. It could be inside boxplot or this AES fill or we could have added a new comma here within this AES function up in our main ggplot and had fill equals exposure group up here. Okay. And that should produce this graph. Okay. And this is the graph you all should have produced in your pre-work. All right. So this is where it came from. I'm just going to go ahead and make sure everybody has this graph. You all should have had it through your pre-work. Just go ahead and click yes to make sure it's showing up here. Awesome. We got a lot of clicks. Yes. Very rapid. All right. Okay. I think we're good. Very nice. Perfect. Okay, guys. So now you've got it made. Let's improve this. Like, this isn't really a plot I would personally put in a paper of any kind because of a lot of aspects about this. First, these axis labels, they're kind of our variable names. They're data frame names. They're not so nice. And they're also very small. So we want to fix that. The legend title too. It's a name from that column. And it doesn't look right. Like, we should make it more human readable. The background should be white. I don't really like the default ggplot backgrounds. The default colors, I don't know about the salmon and turquoise. I think we should also change that. Yeah. So let's do it. So first, the way we're going to do that is we're going to take our original plot script here that we have, and we're going to add to it here. So we add a geom. So we already had this plus sign. We made a box plot. Now we're going to add a geom called scale fill manual. Okay. And this is going to be changing the colors of our box plot. And it's going to change the label of that legend that we have. Okay. So first off, we need the factor labels to correspond exactly to the colors. So these are capital C, control capital C case equals blue and dark orchid. This is allowing us to specify the exact color that we want for the exact factor. Okay. So that's why it's scale fill manual. We're very manually assigning colors and factors to each other here. Okay. Oh, I just noticed. We have a no. I want to make sure the person who's having issues with this, could you check in with your TAs? Or if you'd like to open up your mic with the issue you're having, that also is fine. Hi. I think without holding everybody, can I just go to the TAs? Yeah, perfect. Because all right. I press enter in the wrong window. Oh, that's fine. Yeah. So if you press enter in the wrong window, it's no problem. I do want to hold off everyone for this just because it happens sometimes. So one thing you could either have just you push enter, it didn't work, or you could have here you pushed enter and then it's like, oh crap, like it just separated it. You just want to put it back together and just control enter once you're on that line. Okay. So if you separate it, you know, sometimes, right, like it's just a mistake. I just do control Zed to get back, undo that, and then control enter on the line. And if you're still having issues, I'll go ahead and let you go to. Yes, I think because now it can't even find the gg plot. Okay, perfect. Yeah. So I'll let one of the TAs bring you to a room. Thank you so much for bringing that up though, because again, just because everybody didn't encounter the error this time, many of us will encounter these same errors. So I think it's really good for us all to talk about them together. All right. Thank you. How do I go into breakout room? Yeah. Greg, are you able to? Yep. I just invited her. Wonderful. All right. You're a breakout room. Perfect. All right, great. So just moving on. And I'm sure she'll catch up because I can send the code right along. We're going to be adding again, specifying our scale, fill manual values here. And then the legend title updating here. So we want it to be exposure. All right. So once you've added this code in, and you've gotten your plot updated, you don't actually have to use these colors, you can change the colors to red, white, pink, whatever you'd like, experiment with it. I put dark orchid. I like that one. Dodger blues, another nice one. Go have at it. But once you've been able to modify your plot to the colors you want and change your legend name, go ahead and click yes. And you should run this whole line. So you can have your cursor anywhere on this line and run it. Or you can highlight the whole line and run it. But you should have a new plot in your plotting pane. Great. We've got two people already. All right. So the person who's a no, if you'd like, you can open up your mic. Or we can have a TA take you to a breakout room. Yeah. An unexpected symbol. I'm pretty sure I copied exactly what you put, but maybe. Yeah. So a few things could be there. So an unexpected symbol could be you're missing an equal sign. Sometimes you just need to run the line again. Honestly. But I think a breakout room would be good. Gabby, do you think you could take the people with the nose? We've got two noses here to a breakout room. Yeah. Yeah. Just sorry. Sorry. Please ask your question. So just we can keep track as the TA will be on and off from the breakout rooms. Ask your question on Slack with a screenshot and we'll answer it there or we will make a breakout room. Thank you. Yeah. Thank you, Rashad. Actually, that's a great idea, too. If you put a screenshot in the Slack, we can see it and we can actually debug together. So if you just take a picture of similar to what I'm showing here, your line, awesome. Love this. Okay. So what's happening here is case is actually an open parenthesis. So it looks like where case would be written. It's exiting out of your kind of quotes there. Ah, and the other thing that's happening is blue. You don't close your quotes on blue. So for Emma, for yours, you want to make sure that control, open, quote, close, quote, equals, open, quote, close, quote, blue, and then a comma, open, quote, case, close, quote, equals, open, quote, name of the, whatever color, here's dark orchid, close, quote, and then close your parenthesis. And then this notice is a, it's a named vector. So we've created a vector using catnate here. Okay. All right. Let's look at Alex's. So Alexander, you're experiencing the exact same thing except you have a double quote after dark orchid here. So you're going to want to delete one of those quotes. And then that line should run just fine. Thank you. I just fixed that. Wonderful. Again, you know, just because you're the one having the error this time, I often have these errors myself. So it's very good for us to share. I'm just going to zoom in on this one. Yep. And then this one, ran, you've got a double quote here before dark orchid. So another nice thing is R is highlighting where you have quoted, you want to make sure that control is a different color, right? Blue is a different color, case is a different color, dark orchid, but nothing else. These parentheses, these equal signs, these commas, these are all separated. Awesome. Let's see what we have now. Okay. Rling cannot use gg with single argument. Do you want to try highlighting the whole line? I'm wondering what's going on there. That is actually a bit more difficult. Values equals, I think what might be happening is try putting them all on one line. So when I'm saying that, what I mean is just take this and kind of string them together. I wonder if that will fix it because it seems like it's getting a new line issue here. And you're going to want to make sure that your new line is happening after this plus sign. So you have a plus sign and then new line and a plus sign and then new line. And then when you run it, try to, I think the cursor on the first line is probably best. Let's try it with the cursor on the second line. Nope, it should be fine too. All right. All right, everyone, I'm going to see how Nug was doing here and then we'll, okay. All right. So could one of our TAs please take Nagla to a breakout room? I agree. They are both in the breakout room. That's why please put your question on Slack. Like, I think by default, put it on Slack and then operate your mic if you want to, but put it on Slack so we can track it. Yeah. Okay, perfect. So once they're back, they'll bring you into a breakout room, Nagla. Wonderful. And I'm going to keep going. Okay. Does it matter how many plus signs are used? Yes. So there's only one plus sign. So each section here, so you have one section and then a plus sign. And then you'll have another piece of your code and then you'll add the next piece with a plus sign. Here it's showing two plus signs because this is our saying this is a new line. So this plus sign here on this side is not actually part of the code. It just are speaking to itself saying this is all part of the same command. And it's just coming together through that enter. So through those new lines that are there. Really good question, Christina. All right. We're going to go ahead and move on. And, oh, Greg, can you please take Nagla to a breakout room? No, sorry. Can you please go to the Slack, try to answer. So first answer question like with screenshots. Yeah. So for a shot we already have for Nagla. That's why I'm hoping Greg can take her. Yeah, there is three you can shot before Nagla, that's why. They are all resolved. Okay. So maybe can you put a check or something because if the tier came back? Good idea. So guys, I'm going to share my Slack here. Or maybe you can just answer done like on the thread. Yeah. Once you've got it, if it's good to go, I'm just going to do this on your Zema just to show. If it's good to go, just go ahead and put a check mark. So if you've got an error, then just put a check mark when it's like resolved. I'm going to take, yeah, perfect. That's great. Great. Okay. Thank you guys. Awesome. And Nagla got it working. She had an extra plus sign. Perfect. I love to see this. All right. So we're good to go. We've all updated our graph to have custom colors and also changed this legend title here. Okay. So I'm going to clear our check marks here and we're just going to move right along. So here it's updated. Still not exactly what we want. We still have some changes to make, but the colors are custom now. Okay. Now we want the background to be white and we want larger text. Okay. So these still not publication quality. There are several ways to do it, but the way I actually like to do it the most is to do a theme set before I run my ggplot here. And this syntax is a little weird looking. You don't have to do it this way, but for me it's a bit clearer. Okay. So here this is the previous script. So this is what we used already. And what you want to add upstream of that is this theme set. So I say theme set. And then what theme set does is it says, okay, tell me what theme you want to use for this graph. And I will set it to that. And then within that I'm saying I want theme classic and I want my base size to be 20. So that means my text and everything. I want it to be starting from the base of point size 20. So that's like when you're setting your text size in a word document 20 is what I'm setting here. So you guys, you can choose a different base size. You can either even use your tab to autocomplete when you write theme to see if there's different themes you'd like to try. It's like theme black, white, whatever. But what you'll do next is you'll run this and then run this and it should update your plot. Okay. So once you've added this to before your plot, you run it. And now you rerun your plot. So all of this has to be rerun. Go ahead and click yes. You should have a nice plain background. You should have bigger text sizes. And your graph should be getting to the point that we'd like it at. Okay. So nice. We've got four people who it's working for. And again, just screenshot your error messages if you're running into issues. We can all learn from them. Great. We've got a lot of yeses. And for those of you who've gotten it to work, go ahead and play with different colors, different base sizes, and also different themes. There's so many different themes that are available to you. And maybe you find that you don't like theme classic as much as some other ones. There's other ones that give you grid lines that maybe are useful for certain contexts, for example. Okay. All right. It's just waiting for a few more people to get it. And don't forget to click yes once you've got it working. Are the, the way you have the brackets in the spaces, the way it jumps that you have pretty much is spread out on three different lines. Is that also very important as well? No, it's not. Great question. It's not a necessity. You could actually have these three lines in one single line. So theme set, theme classic, base size, and just close your parentheses. That's totally fine. This is just to illustrate that you can add these additional lines just to spread it out and make it easier to read in your code if you'd like. But thanks for asking. That's a really good question. And so when I, when I run it, do I highlight that entire block? You don't need to. Yeah. So if your cursor is just on the first line here, you want to highlight in general, like outside the, the parentheses here, but if your cursor is on the first line, it should run the full sequence. Awesome. Yeah, I just had the same problem that I had before, which I mentioned was when I run it, it just doesn't, it seems to do what it's supposed to do, but I don't see the plot. So if you've just run theme set, you just set the theme. So then you want to run the plot again, and it will now run it with the new theme. Okay. Yeah. We've got a few more people still working on it. The name equals part of, yeah, this part here. This is naming the, I suppose it's naming the factors. It's, it's for the legend title. It's saying, here's what all my factors are called, basically. So the legend, you want it to be called in this case, exposure. So it will update it to exposure where before it was, I believe it was exposure value or factor, perhaps. I'm going to change screens really quick here. Exposure group. So now it updated it to just exposure here. And so just to run what we wanted here. So like was said before, you just run theme set, nothing, you're not going to see anything. It's just updating the theme. Now you rerun your graph. And it will update that theme in your graph. So now we've got a lot bigger text. And it's a plain white background. And as I was saying before, you can actually change this to different themes. So instead of classic, I'm going to get rid of that. I'm going to tab. And I can see, wow, there's like a lot of different options. So let's see theme line draw, for example, okay. So theme set, theme line draw. See, I could run it with it anywhere on the line. Again, that actually doesn't have to be across three different lines. So here I could actually just have it on one line. It's just a little bit more difficult to parse apart with it all on one line. Now here, I'm going to run it again. So see, this didn't update when I updated my theme. This theme is just updated in the background. And now I'm running it again. And now this is what theme line draw is giving me. Okay. So there's lots of different themes to choose from. You can really find the right one for your specific application or use. Okay. Christine is asking, is there a way to see what all the options will look like for theme set? Yes. I think Google is the best way. There's likely a guide you could find online that will probably show each of them. There's also specific packages with different themes because I believe there's like a theme economist or themes that are actually matching the formats of different publications. So you can actually even have your graph match that publication's format. So yeah, there's lots of ways to look at the different themes. But other than running them within R, there's no way to see them unless you Google it, basically. So I would look up ggplot theme options. Alrighty. Yeah. Okay, cool, guys. So, oh, perfect. Thank you, Gabby. She's adding to our notes. All the themes. So gg theme, these will give you all the different themes you can use. But like I said, there's even additional themes you can always add. This is awesome. Wonderful. Okay. So now let's keep going. We're not quite there yet. It's really a lot nicer than it was before. But I still don't want biomarker underscore value and exposure group with no space as my access labels. Okay. So now we need to update these two. And the way you would do that is, again, plus sign. And it doesn't have to be a new line. We could just do plus sign and keep writing our code onto the right. But if your screen is anything like mine, it's pretty limited in space. So plus sign, new line here, lowercase x, lab. And I want my x label to be exposure groups. And then I'm closing that, make sure that this is in quotes and make sure the quotes are closing. As we saw, that's a major source of errors. And this in quotes, it can be whatever you'd like it to be. So here I'm calling it exposure groups. You could just call it exposure, for example, if you'd like. And then I'm adding another geom here. Again, these are layers on top, right? I'm adding my Y lab, lowercase Y, lab. And then this is going to be the label on my Y axis. And I want it to be biomarker value. Again, make sure you close those quotes. And to be clear, this is the previous script. So we wrote all this, we ran all this, added a plus sign. And now are specifying our X and Y labels. And each of these are separate. So you could only add an X label if you wanted or only add a Y label if you wanted. But here we're adding both with a plus sign on top. Okay? So once you've got it all looking good, go ahead and click yes. And one thing you may notice is you don't actually have to run theme again. So this theme set, this actually is set now for any GG plot you make for the rest of the time. And if you don't want that theme to be the one anymore, you have to now theme set something different. So you'd have to rerun it with a different theme if you want to stop having your GG plots have that theme. Okay, so here this doesn't need to be rerun. You just need to add these two X lab and Y lab here and then run your GG plot part of your function. Okay, great. We've got six people got it already. Don't forget to click yes once you have your nice graph. And then, you know, again, play with the colors, the sizes check different base sizes, you know, maybe it's too large really for the aspect ratio that you're using here try different themes. Check out what Gabby linked to if you have time, or even write notes and contribute to the Google Doc. Awesome. Looks like we're getting there. A couple more. Great. Okay, so you should have something looking like this. So this is actually something you could put in a publication, right? You've got nice access labels. It's all a size that you can actually read. You've got a nice legend here that is easy to read and understand. Notice on the legends what GG plot does is it just puts a miniature version of your data's kind of graph there. So these are just box plots. It's just giving two blocks plots and it's saying the controls are the blue box plot and the cases are the purple block box plot here. Okay. And we've been able to control exactly the colors that we want inside here. And yeah, and then make it a nice white background because that's what we prefer in this scenario. So you read in a data frame, we looked at view using the view function to see what your data frame looks like. You can also use commands such as length and dim to find properties out about your data frames. Sometimes you want to get the exact values of these out, because maybe you're doing something that will use them as variables, for example, like the number of participants in your study could be the number of rows in your data frame. And so then maybe you want to save that as a variable, for example, so you can access these things programmatically. We actually already used it to look at the length of our vector for exposure before. Dim gives you the number of rows and the columns in a matrix or a data frame. Okay. So rows are always first, columns are always second in R. Alright, so when dim outputs values, it's going to output two numbers, rows and columns. I've said this many times, but I cannot stop repeating it because this will help you so much with, you know, making your code faster and having fewer errors if you're auto completing. So if you press tab, what's available to you in your environment will be made clear, you know, after a dollar sign as you start to type your variable. So using tab auto complete is a very powerful aspect of our studio. And I think a major reason for using our studio in the first place. Alrighty, so here, go ahead and write up and look at your variables. So you want to do dim, df, length, df exposure, and dim, df exposure, and then view the output down here. And go ahead and click yes once you've been able to get that. And this will just give us, get us started. Make sure everyone's on the same page at this point. So dim, length and dim again. And if you're encountering errors, please just screenshot them, put them in the slack. We'll decode it together. Or you can even just request a breakout room too in the slack channel. So we've got some people have already, great. Don't forget to click yes once you've gotten this output. And notice dim is two values, eight and four. So there's eight rows. That's great. That's expected. So dim, df exposure, no. Yeah, we're going to talk about why that's no. Make sure to click yes once you've got it. We'll move along. Also, feel free to click, like check out the dimensions or length of other objects, you know, maybe different variables, different dimensions. Excellent. Also, before we talk about it, you're free to speculate why the dimension of df exposure would be no. Awesome. All right. So it looks like we have all the last few. I'm going to jump into what's going on. Yeah. So first, one tip is exactly, ran got it. First, I'm going to go back here. Why is this no, I was going to talk about it in a few slides, but here it's one dimensional. Okay. So there's no dimensions. Dim is looking for a two dimensional object. If it doesn't get a two dimensional object, it's just going to give you no. Okay. So that's the, that is the correct output. That is what you should expect there. So length is going to be giving you the length of your object. You could try length with your data frame and see it's going to give you the number of columns only. So it's then, if you use length on a data frame, it's thinking about it like a list. But a dim of a vector, it's no, there's no dimensions. It's one dimensional. Okay. If you guys are using a function, and you're just not sure, like, what are the arguments to this? Can I use this? Like, what does it use? Maybe you get dim equals no, and you're like, is this right? You can use the question mark and then the name of the function, and then just enter and it'll open in your help pane. So just to make this a bit more concrete here, question mark dim. In my help pane, it opens it up dim. And then it will give a matrix array or data frame. So here, if there's a value input, so the value would be a vector, you'll get no. So it'll be a null output for that. Okay. You can also assign dimensions. So that's what it's saying here. You could say, I want the dimensions of my object to be three and four. So you can also assign them here. Notice also what it's showing is an arrow instead of an equal sign. So here, instead of an equal sign here, r also allows you to do a carrot and a dash to make an arrow, as well as just an equal sign. I'm using an equal sign in pretty much everything because it's only one character. So I just find that easier, but the arrow is also fine to use. Okay. So dim, we've got dim, length, and then the dim of the exposure is null. Okay. Now, what does your data frame look like? How about names for your data frame? Head allows you to see the first six rows of data. Names is showing you the column names that you have. So for head, you have to tell r in the parentheses, which data frame you want to see the names or the head of. Okay. So here, I have names, df, and it's giving me the names. And then this, I created this in Excel. And so for some reason, it's giving me this weird kind of umlari beginning. And then head, it's showing my first six rows. So that's what it is. Head, it just takes the top slice, the head of your data frame. Okay. The function stir str stands for structure. This is my favorite way to actually dig in and look at your data. So if you ran structure df, what you're going to see is this. It's going to give you so many details. So it tells you one, your object is a data frame. Two, you have eight observations. So that's eight rows of four variables, four columns. So it's giving you this in a very explicit terms. Then it shows you each of your column names. It's showing you the variable type. So here, my first one is a character string. If we had set as is equals false, when we read in our data, so that's here, when we read in our data, we set as is equals true. Okay. That caused this character string to be a character string instead of a factor. If we set it to false, then it would have automatically created a factor variable out of this, but we don't want to use it as that. These are just our, let's say there are participant IDs, for example, we don't really need a grouping variable for that. So we want it to just be kept as a character string. So structure actually shows us what is a character string versus a factor, which is the variable that we created, which is a factor that has two levels. The base level or the level that everything will be compared to is control. And so structure is very important for understanding your factors because it can actually tell you what's the base level and is it even a factor? Because we could have had a column, for example, let's see if we look at it, myDF, I'm just going to do view here of it. This, the way it shows up here, it could also just be character strings. This could not, this could in fact not be encoded as a factor and still look like this in this environment. So this structure function tells you how R is actually interpreting your variables. Okay, one sec, I just want to read in the Slack comments here. Yeah, so Dan won't give you information for vectors because it is one-dimensional. Yeah, so I love that question, Carmen. I'm going to come back to that. You wouldn't have such a nice plot in general. And so structure is going to give you the first observations. So Carmen, I don't know if you guys saw, Carmen just asked, what if for this graph we created, what if instead of using exposure group, we actually just used exposure, which is a numeric variable. So we can just try that. Okay, so let's take exposure group away and see if we didn't have a factor there, we use, so what happened was oops, here we go. So the first problem is Phil is looking for a continuous value. So sorry, Phil is looking for a grouping variable here. So it would have been upset by that. Okay, so that's what this first error is. So I'm actually just going to take away this aesthetic inside here. Okay, now let's see. What happens is exposures now continuous. I don't know if you guys can see that. I'm going to also change my theme set to classic. Okay, so what happens is instead of groups, which I wanted, I wanted to have two box plots that are groups. I only have one box plot because exposure actually is, it's not a category anymore. It is a continuous value that's zero or one in our data, but it could take any value really. And that's how R is seeing it, seeing it as numeric. And so that's why having the factor is really important. So then here I'm going to change it back to group, exposure group. And I'm not going to change the fill, see how that changed. Now I actually have two groupings. So I have two box plots that I wanted to compare. And that's what making that factor, having this factor encoded here versus using it as an integer there. That's why it's important, especially when you graph it, because R is going to be treating it now here as a group versus here as just a continuous number. And even though it says integer here, integers and numbers like the alternative would be NUM, which is just a continuous number, R treats those the same. So either way it sees numbers, it's going to treat them as continuous numbers. And that's what was happening before. So in your plot pane, you can actually use your arrows to go to previous plots you plotted. So I'm just going to use this arrow back one and see again, exposure itself is a number. And so you're not able to group your values as you would like to, even though you know zero is one group and one is another group, R thinks you just have numbers there. So here is what you need that for. You always have to recode them then into a factor or is there a way like you said you imported as this, could you tell it when you're importing it that something is categorical if you've got them numbered as like group categories based on numbers like one, two, three, four kind of thing. Yeah. Yeah. So if their numbers as is like there, you will have to tell R explicitly if their numbers are will always treat them as numbers when you read them in. So you will have to create a factor somehow explicitly. Okay. Yeah. Yep. Strings as factors equals false is the same as as is equals true. So those those two commands, they speak to each other like that they you can use either I just like as is equals true because it's short. What if there is if the original spreadsheet had case and control instead of zero and one, then you can use as is equals false. It will treat them as factors and that will be automatic. The problem then that could occur though is you could have case as your baseline and control as your comparator group. So that could be an issue, but it will it will be a factor and it will plot correctly. Notice that the baseline group is plotted first here. And the comparator group is plotted second. And so if that again, like if that's a preference of yours, if you if it matters, which one's the baseline, then you may still have to set it explicitly when you read it in or when after you read it in, set it explicitly in your code. And now strings as factors equals false is now the default in our versions for on. Thanks, Greg. Actually, that's good to know. Another thing though is if you don't know what baseline are how set when you read in your data. So let you let's say you had case and control in your spreadsheet, you read it in, you can use the structure function to tell you which one is your baseline by seeing it here. Your first level is going to be printed first control case, like it's going to be setting them out so you would know what your baseline factor would be. Yeah, integer, it's treated as a continuous variable. So integer versus non integer, it doesn't really make a difference, even though R will specify that it sees an integer here. Yeah, these are all great questions. Okay, so here, I'd like everyone to go ahead and get the structure output out. Just so you guys can see it, get used to it. It's just like a lot of information. And when I first started using R, I found it very like gross and overwhelming. But I think it is the most useful kind of output you can get. So just go ahead and click yes, and then we'll move right along. Once you have that almost there, a couple more. All right. Good. Okay, I'm going to go ahead and clear, we're moving on. Awesome job. Now, let's access specific elements. So let's update this column's name, right? Like this is so weird looking and like, I don't even actually off the top of my head know how to create this character using my keyboard. So it's like a pretty hard one, like a pretty hard column name to actually deal with. So to see the first column name, we're going to think back to when we were dealing with vectors and data frames. Okay, so you can use names, DF to create the vector. And this one in brackets is the first element. Okay, so this will show you the first column name. To overwrite it, you just add equals, and then your new column name here. Okay. And so to do this, or once you've done this, go ahead and click yes. And you can run your stir DF again, structure again, and you will see that it is a new sample ID here, it'll be a new column name. That's one way you can look in view. You can click on it, you can use head, anything you want, you can use names even just run this just names DF to see it. But once you've overwritten the first column name, go ahead and click yes. Awesome. Great. It looks like a few people have it already. Very good. Great. And if you're running into issues, just screenshot it, share with the class is a pretty easy step though. This one should be wonderful. Okay. All right. I think that should be almost everyone. Make sure to stop us if you're running into issues at all. Okay. Aha. Great question, Carmen. Why no comma? Does anyone want to try to answer that in the Slack? I'll give you guys a minute to try to think of why no comma. It's a very, very good question. Yeah, close. It's a vector. So here we can see names DF. It's just a vector. So to access an element, we just want a single bracket, right? If it was a list, we'd want the double brackets, but a vector, we just want one bracket. Yeah, it's a great question. And the thing is, when you're using the data frame brackets, actually, you're not able to access the names part of the data frame at all. So if we look at our DF, when we use the square brackets that have so here DF, and we say like, let's say we want the third row and the fit, no, we don't have five columns, the second column, okay? Three, two. It's counting with ignoring the columns and the rows, or so the column and row names here. So we can't actually access those column and row names using those square brackets with the data frame. We have to actually access names and then in that vector, access that element, okay? So great, great questions. All right. So I'm going to go ahead and clear all here. Awesome. Let's now look at some base of our plots. Now that we've done on this Gigi plot, let's go and look at what R actually just has as like a baseline within R, because you don't always want a publication-ready plot. So graphing, yeah, it allows you to assess so many issues with your data and you should do it with ease and often. So these are some, pardon, different plots. So you could have just a histogram of your data. So this is taking in a vector, a plot. You want a scatter plot. So you want to take in two vectors. One is your x-axis, one is your y-axis. A bar plot, right? You want frequencies by groups. So that's going to take a vector that is the different frequencies of each of your groups, a named vector, a box plot, like we just did. That'll have a formula. We're going to actually look at that and we're going to compare this box plot to the one we made previously. And then a strip chart is actually pretty cool. It's kind of like categorical histograms. So if you have multiple different categories and then continuous values and you don't want to use a box plot, you can make a whole histogram using a strip chart and it'll essentially show the density in each of those categories. So let's look at histogram, all right? So we don't want to forget to specify the data frame of the variable that we're getting. So we want a histogram of the biomarker value. We specify the data frame and here we can use dollar sign, biomarker value. Similarly, though, we could also do hist, df, and then we know that biomarker value is the third column. So we could say we want all rows of the third column and that also will work, okay? So here's a histogram and you should get something like this, okay? So one tip again, use tab. I'm going to keep saying it. It'll just make your life easy because it's easy to make a mistake typing these out. And once you've got your histogram up, go ahead and click yes. Wonderful, okay. So it looks like we're good to go. So you use the dollar sign when you want to access an element that's inside an object, okay? So that was like kind of a weird way of saying that. Here's how I'll say it again. We want a column inside df. We want that column biomarker value that's from the df data frame. So we use the dollar sign to access that column, okay? So more specifically, here's df. We want to go into this column and this is what we want a histogram of. So we say hist, df, and then we say, okay, I want the biomarker value column. So I just push tab to get all of those options up. And then that's how I got it up there. Okay, all right, good. Okay, it looks like we've got it. So we'll move along. And you know, let's say we want to update this histogram. Let's say we're happy with it, but we want to make some alterations to it. So we then can use the same hist function. We have our variable to be graphed. This is from our previous time. Now we want the x-axis label. So this is similar to ggplot. But here instead of adding on geoms, the way it works in base r is it's just additional arguments within the hist function here, okay? And so main is the title of our graph. So we want to update it to the biomarker distribution. We want the label on our x-axis to be the biomarker and then units. And then we want the color that's filling in our histogram. Here I'm making it Dodger blue. You can make it any color you want, pink, white, blue, black, whatever you like. This next line is going to draw a line on top, okay? So ab line, it draws a line on top of an existing graph. And here we want a vertical line and we want it at 50, okay? So let's say this is a pretty important value for our biomarker. So we want to specify this on our plot. And we want to have a line that's showing, you know, here's the values that are above, the distribution of the values above 50, here's the value or the distribution below 50, and here's that line. We want our line to be black and we want it to be thick. So we want it to kind of be bold and show up beyond our graph, like kind of be marked on our graph. And we'd like it to be dashed. So we don't want to just kind of blend in with everything in our graph, okay? So this should give you this graph. And once you guys have this graph, go ahead and click yes, and it doesn't have to be the same colors, play with different line types even, you know, there's different types of dashes beyond two, you could change it to five or three line widths, make it really, really thick, try different colors. You can even move this around. You'll notice so as you run ab line again and again, if you change it, it's just going to keep adding a line. So if you want to make a new graph where you only have one ab line, you've got to make your graph again, which means run your histogram again, and then run your ab line again, because it is literally drawing a line on top of your graph here. Okay. And if you're running to issues, just screenshot it, we'll all go through it together. Ab line, it's just drawing a line on top. So if you run your, if you show your histogram, or if you only run this his, it will just be a histogram, ab line, it just, it draws the line right on top of your graph. So if you write ab line 30 and you run it again, it'll draw another line down here. So you can have multiple ab lines, and they'll just keep drawing line, line, line, line, line, you could add a horizontal line, maybe two is a special value for your thing. So you could make it h equals two, and then you'll have another line drawn horizontally at the two on the y axis. So it just, it's just drawing a line. It's very useful though, because often you'll want to line somewhere on your graphs. And don't forget to click yes once you've got it, got it up and going. But feel free to experiment. Alright, and we've got to know here too. Either Greg or Gabby can pull you into, oh our nose, pull you into a breakout room, or feel free to put it on the slack. So it looks like, so yeah, ran, it looks like you're, you started to write it and it, it didn't close out. Aha, you didn't finish your, you didn't close your parentheses on your first hist. So you want to get rid of those pluses. So go ahead and escape your pluses, like I showed before, you know, where it just keeps plusing on down, escape out of that. And then just run hist, but make sure you close your parentheses. So have your whole hist command run with your parentheses closed. And then abline is a totally new command that's also self contained. The R colors are their universal. So the colors that I'm writing in here, you can use them with ggplot, you can use them with just base R. They'll always be the same. You can also use hexadecimal codes if you, if you're familiar with them and comfortable with manipulating your colors that way. But yeah, the colors, they're, they're universal here. Yeah, and some have special talents. That's right. Didn't get the abline yet. Yeah, so just make sure you get the hist closed. And then abline closed. Let's look awesome. So you got your hist. And now your abline is, there's a quote at the very end. So if you see the LTY equals two, there's a quote there. So it's opening up a quote, and it's looking for that to get closed. So if you get rid of that quote, I think it should add an abline. But also make sure to escape out of your plus sign there. Because if you keep adding comments on that or adding commands on that, it's just going to keep plusing. Wonderful. All right. Looks like we're good to go. Great job, everyone. And I think a useful debugging exercise for everyone. Okay, so now let's look at box plots because we drew a box plot previously. Now we can see how you actually don't need to use box plots in the exact same way as we did before. So here I'm showing box plots using base R. I have my biomarker value and I want it split by my exposure. Notice I'm not using my factor value here. Again, use tab. And you should be coding really quickly. And once you get a box plot up, go ahead and click yes. And if you get this box plot up, you can start trying things with the colors or with the X labels, all of this, everything we did in the previous graph with the different commands you can add. You can add those here as well because base R has all these plotting options and they're universal. So if you'd like, you can change colors, access labels, everything and experiment with them while we're waiting for everyone else to go ahead and get their box plot up. Yeah, so this tilde here is saying split my group and split this into these groups. And so the significance is if you don't want groups, you would just not have that. You could just have a box plot of your biomarker values by themselves. And the other thing is that's interesting is this is an integer and it is allowing you to use it as like a grouping variable here. So base R is different from Gigi plot that way. You don't have to have a factor for your box plots. If you have integers, then it's going to assume their factors essentially for this. So just two things of note here. All right, looks like we're getting it. Okay, so you should have something like this. So we've got a zero and a one here, right, which is what our exposure is. But we know it's a case, a control and a case here. So the factors aren't shown, right? It's just all white because we haven't just, we just haven't added color. We haven't specified anything. All right. So just like last time, we want to update these axes labels, make them bigger, update the colors include a legend. And that's it. Sorry, I thought I had more there. So to do that, we actually do it similar to how we did with the histogram. So we've got our box loss function exactly as we did. This is exactly what you just ran D F biomarker value and then tilde D F dollar sign exposure. You've got your x variable there. If you want, you could change this to exposure group. If you would like a case in control to be labeled on your axes. You've got your y axis labels. So you want biomarker units. Oops, here we go. Your x axis label is your exposure. And we want the labels to be bigger. So the way we do that is using C ex dot lab C ex is a multiplier of the the like text that you're going to see. So when you do C ex lab, it's saying make it one and a half times larger. This is different from Gigi plot where we actually set a base point size. So it's it's a distinction between the two like base R versus Gigi plot. And we want our colors by group here. I just set them to the same ones we used before. But again, explore, feel free to use the ones that you think are best and look right to you. Okay, then to add a legend. It's actually a new command. So this runs all on its own all together. And the legend is layered on top. So it's another thing you add on similar to the abline, how it's drawn on top of an existing graph. The legend is going to be drawn on top. If you run this again and again, but you're changing things about it, let's say you're changing the location, it's just going to keep putting new legends on your plot. If you want to add, you want to update the legend, you need to make the plot again. And then you need to add the legend on there a new okay. And so within the legend command, we use the legend function. We have our x axis location here, our y axis location here, experiment with moving it around. You have a lot of leeway here to move it to where you'd like it to go. The legend labels. So here I'm going to say control and case. And they're blue and dark orchids. So these have to match in order. Okay, so that's what's going to be laying out for you. And the title is exposure. All right. So go ahead and do this. Once you have your graph that has, you know, box plot to box plots. And you've got a legend there. Go ahead and click Yes, feel free to experiment. Again, you can change this to exposure group if you want, different colors, change the sizes, you can make it two times larger, three times larger, if you want, move your legend around if you'd like, change the legend name, see what that does. So, so really, I encourage you to experiment with it, try to break it and see if you can fix it again too. Awesome, looks like someone got it already. Great. Don't forget to click Yes, once you have it. And then once you have it, start experimenting. Nagla, it looks like it's cutting off a bit of it. It may be your legend is cutting off a bit of your plot, because that is what it's supposed to look like in principle, but then there's that big cutout part. Yes, Carmen, you could do this using exposure group as well. Yeah. And see how that, give it a try and see how it changes it if you do that. Awesome. And don't forget to click Yes, once you've got it. Oh, great. Let's see. So, Ran, in legend, you're missing a quote after control. So you want to make sure that closes its quotes and then comment and then close quotes around case, like you have already, it's good to go. But that's why it's just making it a long quoted string. Yes, the position of the legend is determined by these X and Y values, so you can move them around. So also give that a try. Sorry, much to miss that. Thank you. No worries. And if you find it's not fitting, you might get an error that's saying the plot axis, bigger margins are too small. Just make your plot window bigger. That should fix it. Excellent. We got 10 people. Awesome. Yeah. So, Dalia, what you're seeing there is just, it's because of the plot window is very small. If you make it larger, a lot of that will be fixed. Like it'll look a lot nicer. You may have to rerun it though once you've changed the plot like window to be larger, because it will be render in a way that fits that window better. Great question. So is it possible to add an asterisk to show statistical significance on the plot? Yes. I need to find it, but essentially, I think it's text function. And it's similar to legend. You find a point, like you tell it the point on the plot where you want that to write and give quotes on it. So that is certainly possible to do. Is there an easier way than trial and error with the plot or the position with the x and y? Well, so one thing is to actually look, so this is a different plot, I'm just going to get this plot up, is just to look on the axes of where you would actually want it to be. So I'm just going to get this plot. So here, it's at 1.8. So this is 1, 2. No, it's not. That's not 94. So it should be the top right, I believe, is where it's putting it. So if you think about this as a square and it's placing it there. So if I wanted it in maybe the lower, like if I want it down here, I could change this 94 to like, let's change this to like 14. So now this has been moved down. So this is 14. So that's where that point is, if that makes sense. So it's a little bit less trial and error, but honestly, a lot of it is trial and error. Also notice, like I said before, it's just writing it on top. So it just added a second one. So you just need to rerun it to have it have only one. So when you're in base R, then as opposed to ggplot, can you put the legend sort of beside the plot or not really, it has to be kind of has to be somewhere amongst it somewhere. Yeah, unfortunately, I would say that's a major detractor of this method. Yeah. Thanks. Yeah. How are we doing? Right? Don't forget to click yes once you've got it. And I'll give everyone two more minutes and anyone who's got it up, just keep experimenting, trying new things, changing the CEX lab, the position, different colors, labels, you don't have to select the whole command and click control, enter each time. So you can just so let's say you've changed it, you can just have your cursor at the top here, like just anywhere, and just run it again that way. See, so I just have my cursor anywhere, control, enter, it ran again. You can tell because it erased the legend. And now just control, enter, run again. Alrighty. So that run your entire script again. Sorry. Great question. Yeah, no, it just ran the one command. Okay, here I run this. It just ran that. And the way I know that is because down here, I can see what was run. So I'll just be like, I'm just going to write a comment here. Comments you can tell because they have little hash symbols in front of them. So this is a comment. So then I have that comment there. And then I'll just run this again. It just ran the one command with the comment above it here. Interesting that it automatically ran that comment above. But it didn't run the rest. Thanks. Yeah, thank you for asking really good questions, you guys. It didn't run the legend exactly, just ran the one command. And then if I want to run the next command, I have to specify, like I have to do that now. So I do control, enter, and now it ran that one. So it see it's just hopping one command each time, it will only one one little piece of the code. And that piece is self contained by these parentheses around it. So that piece is all that will be run with that control enter. Okay. All right. So we've got the shape and location of this. It changes with your plot window, as you've all probably noticed. So the plot window is very important. You know, it is also important for Gigi plot a little bit less. So as you can see here, it is extremely important. In some cases, it may be that the plot window is so small, part of your graph got completely covered by the legend, even being in this location. So yeah, so you can try out different locations. Unfortunately, a lot of it becomes trial and error. And then the CX, it changed the sides of these axes as well here. Okay. So now you guys are going to practice on your own. So you're basically going to do what we did this morning in your own code. So on the course website, there is assignment data one, it's a CSV file. You're going to set your working directory. So you're going to download that data, set your working directory, read in your data, inspect the data frame, create a responder factor that is two levels, zero and one. The baseline is a non responder and the comparator group is a responder. So you want to know the difference between the responders and non responders to some treatment. And then you're going to you're going to create a box plot of your biomarker relative to your response. And you can do that using base art or you can use it doing ggplot using the ggplot functions. Okay.