 I'm just going to jump right in here. We're going to be doing linear modeling this afternoon and you know it's going to be a lot like the t-testing examples. We're going to have formulas in R and we're also going to see how we can investigate the output of this model and actually print it out similar to our t-tests that we did. So investigate it with structure and pull out different components from our linear model that we actually want to save down the line. So by the end of this lecture you'll be able to fit a linear model and graph the results. Yeah but this you know if you use non-linear models so let's say you're doing logistic regression this will be similar. You'll be changing up functions a little bit but if you often you'll have maybe a control variable for example that you're interested in the t-test of your markers say against a drug response but then you have some other control variables like sex and more. You want to use basically a linear model so that you can control those other aspects of your model in there. So linear models are very important I'm sure I don't have to convince you of that. All right so first though let's plot two markers against each other and see how they actually relate to each other. So here we're going to use a different R base R function that's just plot okay and so we're going to use these two commands or these three commands and we're going to be setting our pars so we could do this we could just make a comma and have another argument to par or we can do it on separate lines so I'm doing it on separate lines and what this first line we haven't seen this par before this parameter for our graphing this is mar so this is setting the margins around our graph and then par mf row we recognize one by one we want one row and one column of a plot so that just means a single plot okay so here we're going to specify the margin around the plot so how small like how tightly packed or if it's spread out on your plotting pane and then we specify of course the number of columns and rows in your plotting window we just want one plot here we're going to do a scatter plot and this is our x variable we specify and our y variable we're specifying here and we are going to make a plot like so one moment now we know the number of rows and number of columns for mf row but for par mar so the margin around this plot that is this margin so up around here around this side around this side so the margin it includes um pushing up the margin underneath your axis labels here um we're going to specify that using par and then the mar argument so the bottom margin is first left right or top and then right oh sorry i'm selecting zoom there we go right so bottom left top right that's the order that these numbers represent so you can experiment with this actually if you run these lines together you can plot it one time with these all together and then experiment with this see if your bottom margin is one for example how does that change your plot layout um and if your bottom margin is 10 for example then you can see how that changes it and go ahead and click yes once you've created this plot pardon me once you've created this plot because it's a base r plot you can use all those base r kind of parameters uh to change aspects about this so you can give it a title you can change the x lab and the y lab so the labels on the axes you know if it's easy for you to get this plot going um and then you can also do like a cex dot lab right the cex lab if you want to change the label sizes um you can also change the color so add call um and then put a color in um we saw main earlier today if you want to change the title so you can do lots to revise this plot and you can also play with your margins if you're able to get this plot going and go ahead and click yes once you've got it and i'm going to be back in one minute so i'm trying to add the x lab and y lab but um so i wrote it down but it doesn't add it and i'm not sure if maybe i did it not as part of command or yeah and um yeah so you want to make it part of the command so add a comma um and do it within plot right mm-hmm okay within the parentheses yes exactly within the parentheses yeah okay okay great uh can we try something um it's not counting the guesses now for me i'm just gonna um um rasha you're not you're not uh host yeah rasha do you mind making me a co-host yeah and please like me i'm not always in front of my uh thank you um how come you're okay yeah it's weird perfect that's great okay good got it all right i thought it was thought this was really simple for you guys and it was good so you've got your plot up now let's update our points to be filled in circles and not black and let's make our access labels bigger and marker one and marker three like you guys have already started to do so we've done this before point color you can just do call equals blue easy and choose any color you'd like point type so this is um this would be on any of the um cheat sheets for the base r um graphing parameters um but each of these numbers so see how i choose 19 here this is the filled in big point um each of these numbers is a different point type so the default is one um when you're using base r plot um so you can choose any of these um here i'm making it 19 this filled in point here um but you can even choose uh you know 22 that'll make it so it's a black outline um and then the filled in color that you put um actually you might even be able to if you choose um these ones over here you may be able to specify call and fill as another parameter and then have two colors on those points i'm not a hundred percent sure about that but if you change call and you have the pch is one then it will just change the color of the outline versus here if it's 19 it will change this entire color for you all right there so here uh you can see it's filled in blue points um and we know cex lab we know how to do that and as ran had pointed out um if you put these arguments outside of the plot command then they're not going to run you want it all inside the parentheses and the separate arguments are going to be specified uh with commas in between them all right so go ahead um go ahead and click yes once you've got it nice and revised for yourself some of you may also have to change um the margins a little bit um to make it fit awesome go back to these point types so you can play with the different point types one thing that's fun is you know um in future you could even change your point type um to be variable based on your data um so it could be like little triangles for some and squares for others for example um so point types are fun to use for that all right we got about half of the class is there and if you get it keep playing with it just keep trying things um if your ex if your access name isn't showing up you need to add to the margin on that side um so on the left side that's a second number here if you make this number bigger and run it it'll push it over so you'll have more room um so it might be being getting cut off there's a very very common thing with base r plots um that you know you're not you don't have the margins that you'd initially actually want wonderful all right let me give everyone a minute or two okay looks like we're getting it okay so we've got our plot it should look something like this you know maybe um you chose a different color maybe you made your points even bigger um anything like that but pretty close to this and so here the reason i'm having us do um a plot first is because we're our linear model is going to create a line through this plot and so when we plot our data first we can actually you know think about do we see a correlation here is it positive is it negative what do we expect our coefficient in our linear model to be okay so from here now we're going to fit a linear regression of marker two versus marker three okay and the way we do that is just using the lm function so we're going to save the model output as an object this means that r when you run this command it's just going to run this command it's going to create this object in your environment and you're not going to see anything else if you want to see the output to the linear model remember you can like while you're running this command you could just put parentheses over the entire command and then print it that way um our y variables so this is our outcome variable so see this is an r formula we have our y variable here and then our x variable here if we had more variables um like let's say you want to include now also a control for sex like df2 sex you can just add it with a plus sign so just add on this side um so df2 dollar sign marker and then plus df2 dollar sign sex for example um but all within the parentheses okay so once you've created this linear model object go ahead and click yes um and and investigate the output see what you've got there um you can also experiment with changing your linear model you can um add some controls in there maybe for sight or sex um or other things um and just see what you get that said we will be using this specific linear model object down the line so if you do change it save it to a different object name wonderful it looks like everyone's getting it wonderful all right we've got a few more people don't forget to click yes once you've got it and if you're running into issues feel free to take a little screenshot and put it up for us to look at because even if we're not dealing with that issue right now i'm sure we'll all deal with it in the future um these bugs are very common to us all yep the output is just going to be this object saved so it'll be in your environment pane right so here this environment pane once you have a linear model fit so i'll just run it here lin mod in my global environment i have lin mod and it's a list just like my t test is a list all the analyses they're all going to create list outputs um so if you did a generalized linear model it would also be a list output okay so that's all you'll see and once it's there just click yes so this is a tilde so you don't want to you don't want to dash you want a tilde in between so formulas in r they always need a tilde separating your y variable from your x variables okay so looks like we got it wonderful ran yeah make sure though you you save it oh yeah no that's a different object yeah um added six uh because it's not df it's df new um so you're referencing the wrong data frame yeah okay so it looks like we got it now let's work with this object so if you do a summary on this output so you can run summary lin mod it will give you a lot of detail um of your linear model this is really useful i actually really like the summary output of linear models um and generalized linear models so um everything i'm saying for linear models goes the same if you're using um let's say you're doing a logistic regression you have case control outcome um that also is the same here um it's very useful um to use summary lin mod so it's going to give you the call so it's going to give you the formula that you ran um just so you know um and note that it says formula it's because this is an r formula it's got an outcome a tilde and then some predictors okay um it gives you the residual distribution this can help you know if the residuals are centered um around the mean so if they're normal residuals right uh for um kind of diagnostics on your linear model then it gives you a really detailed overview of your coefficients so the first one is the intercept so this is your y intercept um and you'll get an estimate your standard error of your estimate the t value this is computed from these two right um and then per like so it says probability that the absolute value of t is greater than the value given this is the strict definition of a p value right so the p value is the probability that you will observe a greater absolute value of your t statistic than what is given there and so here our p value is uh quite significant for intercept that's very common and not particularly of note um but then what's really nice is our marker two we see it has um an estimate here uh standard error t value and our p value is significant at the point o five levels so there's it's a lower probability than five percent that we would observe this randomly given that there's actually zero relationship between these two variables right so the hypothesis is that the estimate here is zero okay so it's saying it is not zero it seems that it's not zero we would um reject the null hypothesis that it's zero at five percent here we have significant codes that help us along with that so these stars represent a certain level of significance so if you get a point it's significant at the 10 percent level if you have nothing it's not significant um if you get a one star it's significant at the point o five level that's why we're seeing that star there similarly two stars would be at the point o one level three stars is at the point o o one level um and that's what we see here so we see um a very significant um intercept again not really of note um we also get other details down here about the total fit of the model so down here we have our residual standard error um fine we also get our r square and our adjusted r square so this is telling us how well um how much variance of our outcome our marker three is explained by our model not a lot so it's we're not explaining marker three's variation that well with the variable uh that we've included here we have an f statistic for the whole model so again this is on the variance uh that's being explained um and the model it is significant so it's explaining more than zero variance but it's not explaining much more um essentially is what that's saying okay so this is all really useful information um but we can also look at the structure of the summary so just to go back here here we took this we created this object lin mod and we ran summary to get all of this output but now let's use structure to see what components of summary are able to be extracted um from this list because even though this is printed out nicely here this is like a formatted output summary is a list that contains a lot of details that we can actually extract if we want to for example do what we did before with our t-test where we extracted individual components and printed them to a table so here i'm making a kind of compounded uh command here so i'm looking now still at the summary of my linear model but i use structure to even look deeper into it so to see what's there and accessible so i can pull out the formula and i can pull out the terms that's fine um but what i really like is here i can use these i can pull out my residuals and i can actually plot my residuals so if i want to see how normal my residuals are i can actually pull them out and explicitly test the normality of my residuals if i wanted i can also extract individual coefficients standard errors and p values so this is very similar to what we did with t-test where we were pulling out the point estimates and everything here we can pull out our coefficient and p-value if we wanted to um so yeah that's essentially that so if you think about our loop before where we were looping over our t-test and we were pulling out for each one of our t-test we were pulling out a specific um component of our test here maybe we want to pull out the coefficient and the p-value for all of our markers versus some outcome of interest okay so now we have a linear model and we want to add that line this line of fit to our plot so here is your plot from before what you can do is you can just take that linear model and you can put it into abline so use abline remember from yesterday this just draws a line on top of your um on top of your data and what abline has is a really nice way of if you give it a linear model it will take the intercept and the slope from your linear model and it will use that to create a new line here and so here i'm changing the line type to to be dashed so that's what lty2 is doing there you can also add lwd um to change it to make it thicker if you want to change the line width um whoops um and yeah this will go ahead and it will give you a line of fit over your data so i'm going to go ahead and let you guys add that line of fit and go ahead and click yes once you have it one second i'm just going to add this guy on here so add that line of fit on top it should be a pretty quick command and if you get it easy try to change the color of the line change the width change the dashes um just make it uh revise it make it work for you awesome so you can also if you wanted to you could add another abline that's just a vertical or a horizontal line so they just will keep adding on to your graph okay so you can just add add add these ablines all right looks like you guys have it i'm going to go ahead and clear on to the next so now write a function that allows you to input two columns or two vectors um and output the summary of the linear model of the first input column versus the second input column so this one is uh your first input is going to be your y and your second input column is going to be your x okay and click when you're done and this is if you have a very quick memory you can see that no i put these in the wrong order and if you're able to get this quickly if you're able to pull this out quickly a challenge will be try to not put out the whole summary try to only output the coefficients of your linear model all right and if you have it keep trying to make it more complex um so you can output only the coefficients now not the whole uh summary you could also add a plot output to be printed at the same time the function runs um where you overlay the line of fit with your graph um so these are all just things you can add on if you're if you're able to get this function running right it's to it here so that we don't get my constant words let's do it okay so for anyone who needs a hint here so the function it's going to take two variables so you define two variables if you go back to the anatomy of a function the arguments that you're defining in the parentheses you'll need two arguments right so define two arguments then within the curly brackets run your linear model and output the summary so just two lines and remember it will just show up under your functions it's just going to be a new function in your environment so it won't output anything but you can test it by running um it with marker one and marker two for example um as your input so as your actual vectors it may need print around plot karmann um or if you put the plot call above ah it's because you're plotting lin mod um you want to plot the the x and y values um there so column one and column two uh that's another so just remember you just define a function and then you have to run the function to see output from it so you'll see the function up in your global environment under functions so you should have a new function here that's named with the function that you gave it the name that you gave your function um and let's see there you go yeah all right and don't forget to click yes if you got it i think a lot of you got it here so we're not defining what the variants are the inputs are the inputs so they just have to you just have to give two variables that can be yeah as inputs um here yeah one column one column two okay i'm just picking at the at the solution in the slides but i'm trying to figure out yeah here so i'll just put the solution up since we're all we're seeing it all around um but here see you need to provide two variables these are the two columns right so these variables are going to be used now in your function um to do your linear model but remember this is um once this is defined it will only show up as a function in your environment then you can use it to actually put in yeah once i define what the variant variables are yes later on exactly exactly oh i'm just setting the ground for that's that's right and so a function is totally general you're just setting it up so that if you put in anything you can run it and get the output that you want so it's just taking some variables pet putting them through some procedure and spitting out something else in a totally general way and that's why these variables you define here they're just for the function to use and i've found a typo so i'm going to keep keep going if you're having trouble with it keep going try to get it to work and then run a test line uh with it and click yes once you've done it because i want to make sure you guys are getting these functions the linear model running things through these kinds of procedures um because this is really key awesome so we've got some examples where people put in default variables which is great so then you can run it without actually specifying which variables these are let's see what we have here so um ran it looks like you're just running things inside the function um do you know what i mean so you so the function this has to run all as one unit um so i see just individual lines being run um so it seems like oh like the yeah why uh probably cursors like on one of these intermediate lines yeah you know it's fine okay i just selected all of it yes yeah so that's one way or if you're on the outside here um you can just use it uh just have your cursor on the top line for example oh the top line so i wasn't yeah unfortunately because the thing is in between uh the curly brackets and this goes for loops too in between the curly brackets r is going to read this as a single line of code but if you're up around uh a function level then it's going to say oh it's a function okay i'm going to run everything this line and all the curly brackets same for a loop if you're at the for loop level the very top level it's gonna say oh this is a loop and it's going to run everything that's in those curly brackets but if you're inside the curly brackets it doesn't know it would just say oh like it's just this one line of code i don't recognize any of these variables i don't know what you're doing and and it'll give you an error yeah okay thanks so now we can define val 1 and val 2 exactly and you can run it and get your output i see okay yeah and even better you can run it in a loop so if you have many many markers that you want to test and you have them all being run with a specific coefficient that you care about um but other controls that you include in your model you can add those on so i'm just gonna edit it actually here um because this is the last piece before we take a short break a very short break and a very short break and then go on to mark down