 Hi folks, this is Dr. D. I apologize for being late on this walk through. The flu kind of hit me a little bit harder and I thought it would, and I've relapsed a bit here. So bear with me, we'll get through this remix and if necessary you can take an additional day to turn in your work. This is the lab 5 remix and this is the remix student name.rmd file and this is the one that I've worked through that's got a solution on it. That's why I've got dash solution. I always like to keep the original copy for stings. I don't have to worry about that and I just make a copy of it and name it the way I want. Okay, one of the things I wanted to point out to you that may help some of you is the newer versions of RStudio and Posit Cloud, which is RStudio in the cloud, now have this visual tab and it may help you see what's going on. When you're in the source, you get the raw code over here and for example there's an image, all this can be distracting to some people, but if you click on the visual tab, you get a message like this saying you're activating the R mark down visual editing mode. But anyway, it just allows you that you can take this check mark off or leave it on if you don't want to show the message again. I'm going to take it all because I'm making more videos and just click on use visual mode and then the top part doesn't look like it's changed, but it has. As you scroll down, you can see that it's taken all the markdown coding and highlighting and makes it look more like the web page and more how it will look on your final report when you knit it to either word or PDF. You can still see the code chunks and you can still paste in them the same way that you could on the source view. And sometimes when I'm editing a lot of code, I will go back to source view for some reason just seems more familiar to me, but for many of you using the visual will make it easier to read and to see what is going on. One other thing that I would point out to you that I always do to make sure we're trying to make these things into a reproducible document. That means everything someone else would need in order to reproduce what you've done has to be in this rmd document. Objects that are over here in the environment that you created in prior sessions. If they're not, if the code that sets these things up is not in this document, then whoever you want to reproduce your work cannot reproduce your work. So I'd like to do to make sure that I haven't forgotten that I go up here to the broom. Clean suite and you remove all objects in this environment. This is just from this one work space in this one work session. So I'm going to click yes and that cleans out my environment. So now I know when I create something, it'll pop up over here and I know it will not hinder me when I'm knitting to make the final document. This first code chunk works when we're knitting. You don't need to worry about it right now. The first we need to run is this one loads the tidyverse, which we use a lot. And now it's loading the infer package, which has the thing, the code chunks, the functions that I'm trying to say that we need for this assignment. And pretty doc is just something that will help make your final nitted document look a little better. So I'm going to go ahead and click on run, run those. I can see down here in my console, I got my ready prompt back and of course the RAM isn't running anymore. So I know that those ran successfully. So let's scroll down here. You can read these scroll. Get rid of that. Here we go. It highlights when you double click like that. First thing we need to do is set this C and you can read that link we've got up there that will explain, you know, why you set that and why you set a number. We're saying 76 is received. All this does, it makes it possible for us to reproduce when we run one of the random functions. We give it a flag to tell it where to start so that every time it runs, it knows where it starts so that we'll get the same answers. If you don't set a C, then you'll get a different answer every time you run the code. Here we're going to read in the SAT GPA. This is our main data file and I'm just going to click run. And you can see we get some messages here. And over there we've got the SAT GPA showing up in our environment. One thing you can do, this is a message. It doesn't say error anywhere. It doesn't say warning anywhere. It's just a message. And I will show you real briefly how to suppress those messages inside your R code symbol up there. Put a comma in space and start typing message and it will offer up message equal. We will false and then I'm going to start warning, warning equal, false. Now when I run, let me save this. Now when I run this chunk, it won't display the messages in warning. It will still display errors, which you must leave that. You don't want to suppress the errors, but it gets rid of this red stuff, so it should anyway. There you go. So now we've loaded it again and we don't have the red message. That's something you can use on any of your remixes and rises. I'm sorry, your rehearses and remixes. Put that little code in there and it will suppress the warnings and make it for a cleaner final document. First thing it says we got to do is inspect the data. So we've got this data object set underscore GPA over here. It says there's a thousand observations. That's a thousand rows of seven variables. So you can get a glimpse at it here by just clicking the down air and it will give you each of the variables here. And it says where it's a number. This first one is just the number of the row. And then the first real variable is sex as a character or a qualitative character categorical made up of male and female. Note how they spell and how they capitalize or otherwise hyphenate these names because you got to reproduce these names exactly. S-H-T verbal is the next variable and it's quantitative and it's numbers. And these are the two we're going to be using in this remix. And then what I would do would be to go back over here. Excuse me. What I like to do in order to seal them more clearly, I'm going to click or double click on the name and it will open up this data object. If it's an object in your editor and source editor and it looks more like an Excel spreadsheet. So you can see we've got the columns, these are the rows. And there's our two variables of interest times sex and sex underscore verbal. The example code that you're given uses the S-H-T total I think in GPA. But as you go through and edit the code chunks, you'll be using these two variable names. So let's scroll down here. Whoops, let me get back into the remix and help with it. So that's what I mean by inspect the data, identify the variables of interest and then how the data is presented. You can use this glimpse function. That'll give you a sneak peek at it. I prefer opening it up in the source editor window so it looks like an Excel file, an ordinary table, which most people are familiar with it to understand. Here's our first question and I'm going to change this when I edit this document. Is there a difference in male and female S-H-T verbal scores? Okay, so you know it's looking at the sex that is the simple dichotomy, which I know we have more flavors today than we used to. But we're using just two, male and female, these are self-reported. And that's in the sex variable. And then we're going to use this S-H-T verbal score. So those are our two variables and we're going to sort them by male and female. First thing it says, calculate the S-H-T verbal score for each gender using the group by and summarize commands from the Dipler package. Dipler is in tidy verse. It's one of the sub-packages in there. And it says you need to get code five, but we've given you that. I put a note there and I moved it from yours. This hint was in the wrong place. It says you will need to create a new data object. One of the things that I see students get confused is we're giving them the example. And that's, I've got it in green here preceded by a pound sign. And you can see there, the first step in the old code was create average underscore GPA and sex. That's comparing average GPA by sex from the S-H-T GPA, our main data source. We want to change that so that we know we've got a different variable. And again, since we cleaned out the environment, it'll make it a little bit less complicated. But this way I know the new variable, you know, what I've named the new variable. And it's just saying we want to create average verbal sex using S-H-T underscore GPA. And this is the pipe operator and then group by sex and then summarize by here. We used to have GPA, FY equal mean GPA. And now we want to change it to summarize. We're creating another column there, so to speak, S-H-T verbal. That's the name we're giving it. And we want the mean of S-H-T verb. I'm sorry, this is our existing column, but that's the S-H-T verbal variable and we want the mean of that. And then we want to just print it out so I can run this code there. And you can see this is helpful. I know I'm getting long-winded here, but by keeping the old code and putting the dollar sign in front of it, that turns will change the color, but it tells R to ignore that. By looking at the original and then looking at what I use, that can help me when I go back and I want to reuse this code or if I've got some error checking to do. So I recommend that, you know, keeping the old code, just put a pound sign in front of it, then just copy and paste below and then edit it appropriately. Here again, we've changed the two variables, S-H-T verbal there. The old was GPI at physical year. So there's our first. So this is the way I recommend you go through it. We're just going to do a simple subtraction here, taking these two values, 48.60 and 49.264, subtracting them. Now, I know I subtracted the male from the female. You can do it either way. Just be consistent throughout. Once you start with the female being, you know, on the left side and the male on the right side, you got to continue that. Granted, this gives a negative, but it's just my practice. I keep them alphabetical. F, M. And then we have these questions we're going to answer. What is a different sample mean verbal scores? And that's what we just calculated there. The minus 0.644 with the male being higher than the female. And that's what I wrote there. And then here we're making a guess. Is this different statistically similar? Now, we haven't done the calculations yet, so I don't really know. But just by looking at this, you can see that it's 0.64. And we're at 50. So that's, you know, if it were 1 out of 100, that'd be, you know, 1 over 100. This is 0.4 over that, over 49. So it's about less than 1%. So is that really a big enough difference that we call it statistically significant? My guess would be no. Okay. And here I added some code. You can do this just to divide those simple math and that gives you the about a 1% there. You can add that if you want. Next is to generate a visualization of the verbal scores. Be sure to include a title, label, or accent. Now, here again, I've got the given code chunk in the parts I'm editing. I put the pound sign in front and it turns green. I hope you can see that. When we're doing ggplot, remember it uses the plus sign instead of the pipe operator. And we can chain these things together. You'll see in some of the assignments, we'll go from tidyverse with the pipe operator into ggplot. And then we use the plus operators. And if you go back into tidyverse, you would just at the very end here. And that's what's telling our stop. There's nothing after this point. That's why it stops processing. But if I put a pipe operator there, I can go back and do some more if I wanted to. So that's how that works. And again, I've just changed out. We're using the same main database. So that's why sat GPA is the same. We're still using sex as one of our variables. But here now we're using the sat variable for the Y variable instead of GPA at Y. And then I've edited the title of it to change it from grade point averages to sat verbal scores. And then down here again, I've edited this line to go from the Y equal GPA verbal score. That's the label for the Y axis to sat verbal score. And again, we can just run that and then run it. There we go. And it gives us two box plots. And here you can see these dark lines are the medians. And they're pretty close together. There's a little bit of difference there of a male. That's the male on this side. And I forgot to label that. But if you go back and do that if you want. And this is the female over here a little bit lower. Again, that's why I don't think it's statistically significantly different. There's a difference, but there's a difference between a difference and statistically significant difference. So you have to learn that. And then we state our null and alternatives and null is usually a form of no difference. So I just say there's no difference between female and male verbal sat. The alternative is the reverse or the opposite, which is there is a difference. So we're testing this null. And we're starting with assumption that there's no difference that if he's attracted female for male or vice versa, you'd have a zero difference. So that's how we get going. And you just go through each of the steps. I think we give you the starting code. And then I would really recommend getting the habit of using the pound sign. And identifying what the variables were are and then what the data source is and then reuse the data source unless you've got a reason to change it. But I usually like to remain here again. We're creating other data object. That's what that it's called the cat to nation, but it's just I create what I like to call it create using sat GPA data. A new data object called observe difference verbal sex. And that makes sense to me. And then we calculate sat verbal is a function of sex. I think we already set that in there. And then we calculate the statistic stack in here. The one we want is the difference in means. Okay. And the order since I started with female first. We want female first and then mail. Of course, once you start with mail, you could keep it all the way down. And this just this last line just prints it out. So you can see I changed it again from observe different GPA sex. That's the old one to now observe difference verbal sex. So we run this first thing we see we've got an answer here. And it just says that my response variable was sat verbal numeric explanatory sex, which is a categorical qualitative or sometimes call a factor. And then in this one, we've got that difference there again minus 0.6444. And that's what's stored over here in the observe difference verbal sex. You see it shows up in the environment. We've got one observation of one variable. If I click on that, we'll see over here, we've got our statistic. So that's the basic process. Let's just go through, be patient and identify which variables you want to change. And here's the next code chunk and the original is GPA is in the world. Now we're dealing with verbal. So I changed the SAT verbal in the world still using the basic set GPA. We're using SAT verbals of function of sex as opposed to GPA FYS of function of sex. We're doing the null is independence, leave that alone and this generate reps. We don't have to change anything there. And then down here, I'm just again, preying this out a little bit that the new data object. So I can see that and run it. It'll take a while to run and then you can see over here. And let's close that. We've got this new one, SAT verbal in null world. I can expand that so I can see the whole name. You can see it's got account those zeros. There's six of them. So it's one million observations, which means there's one million rows and we've got three variables. And we can just see these three variables here. Sex, verbal, SAT verbal, excuse me, sex. And then we've got the replicates one through one million here. That's just remembering the rows. So this is how you go. This has been a long walk through. I apologize. I will try to give you some more help, but you do this all the way through each of the sections where we give you the code. Again, I recommend putting the pound sign so that tells our, this is no longer code. This is a comment. And then you can use that as a guide and change the data objects you want so that you can keep them different from the old one. And then make whatever other changes you want to use. Mainly we're going to be changing the GPA underscore FY for SAT verbal throughout all these things. And I hope this helps.