 Hello, this is Video Covering Module 2, Descriptive Statistics. So our first stop will be measures of location. As the name suggests, we talk about the location of one data value with regard to the entire data set. So measures of location or measures of relative standing or position is used interchangeably depending on what textbook and depending on who you talk to. It described a location of a data value relative to other values in a data set. So we are going to cover quartiles, percentiles, and z-scores are actually also measures of relative standing but we'll discuss those in a later topic. So our first stop is going to be quartiles. So measures of location will first stop at quartiles and their measures of location that separate the data into quarters. So believe it or not, quartiles split the data into quarters. They may or may not be a part of the data. So a quartile value may or may not be a data value in the data set itself. So the first quartile, also referred to as Q1, separates the bottom 25% of sorted values from the top 75. So quartile 1 is just above the first 25% or the first quarter of the data. Quartile 2 or the second quartile also known as the median which will be discussed in a little bit. It's the second quartile Q2. It's the same as the median. It separates the bottom 50% of the sorted values from the top 50%. So your data have to be sorted to find quartiles and the second quartile is right smack dab in the middle. And then the third quartile or Q3 as it's notated as is separates the bottom 75% of sorted values from the top 25%. For those of you that want a visual, so in your data set you'll always have a minimum value. You'll always have a maximum value and then you have your quartile spaced out accordingly. So they break the data literally in the quarters 25%, 25%, 25%, 25%. For a data set, the five number summary consists of the following five values, minimum value, first quartile, second quartile, third quartile and the maximum value. So that's what a five number summary is. We'll actually practice finding the five number summary and the quartiles contained within it in just a moment. But just kind of as a little extracurricular discussion here, the inner quartile range also called the IQR is found by subtracting quartile one from quartile three. That's called the inner quartile range. And what that's used for is to find outliers. An outlier is an observation that does not fit the rest of the data. It's kind of far away from the rest of the data. In this course we will visually look at data to identify outliers, but the actual specific definition for finding an outlier is that a data value is an outlier if it is more than one and a half times the inner quartile range below the first quartile or more than one and a half times the inner quartile range above the third quartile. So that's actually how you find the cutoff for outliers both below and above the typical values of the data set. So that's actual. So like I said in this course we'll just look at data and we'll visually identify if something is an outlier or not. We're not going to get too technical with it. So the five number summary is actually used to make something called a box plot. Some of you might know this as the box and whisker plot because you have literally your kitty cat face right here, this box, and then you have the whiskers coming off from the side. So notice you have a number line and that's marked with numbers obviously, but you have your minimum data value, you have your maximum data value, you have your median, you have your first quartile, and you have your third quartile, and you literally put a line through the median and draw a box that connects quartile one with quartile three and draw the whiskers going out to the minimum and the maximum value. So a box plot or box and whisker diagram, it gives you a quick picture of the middle 50% of the data. That box is literally the middle 50% of the data. So how to construct a box plot by hand. Well you need that five number summary, that's the most important. Then you make a number line, you construct a rectangle extending from quartile one to quartile three and draw a vertical line that goes through the median or quartile two. Then you draw your whiskers going out to the minimum and maximum values. So let's practice, let's practice finding the five number summary and construct a box plot from the following data. So I have a whole entire data set, we have one, two, three, four, five, six, seven, eight data values times two, sixteen data values present. So my goal here is to find the five number summary. So what I need to do here is find the middle data value first. So I'm going to have to find, well I'm just going to make a list, how about that. Minimum, quartile one, quartile two, remember that's the median, quartile three and then the maximum value. Alright so the data have to be sorted in order to find this five number summary. First off we have a minimum value of five, we have a maximum value of sixty two. So make sure that data are sorted here. Alright so alternate, I have five, I have sixty two, I have six, I have fifty four, I have six, I have forty, I am finding the middle data value, I have nine, I have thirty six, I have eleven, I have twenty six, thirteen, I have twenty six, and looking at what we have here, we have two values left in the middle, we have sixteen and eighteen. Our median or our second quartile will be sixteen plus eighteen over two. So little calculation off to the side, sixteen plus eighteen over two, find the average of sixteen and eighteen, and that's just seventeen. So your second quartile is actually going to be seventeen. So I'm going to place seventeen here right between sixteen and eighteen. Now what I have done here is below seventeen is fifty percent of the data, and then above seventeen or greater than seventeen will be the other fifty percent of the data. I now need to look at the bottom fifty percent, I'll go ahead and circle it, I'll look at the bottom fifty percent, those values that are below the median, look at the values below the median, below the second quartile, and you need to find the middle value. So I have, okay, cross out five, cross out sixteen, six, thirteen, six, eleven, and look at here, we have two values left in the middle, seven and nine, average them to get eight. It's the same thing for finding the third quartile, look at all the values above the median and start crossing out, so cross out one number from each side, and you're left with two values in the middle, thirty six and thirty seven, so you have thirty six point five. That is my five number summary. Now you don't have to go through and by hand find the five number summary, you can actually use your Google Sheet Spreadsheet document, you can use technology to help you find it. So what I'm going to do is I'm going to go to my Google Sheet Spreadsheet, and I'm currently on the one variable stats tab or one var stats tab, and I literally just type in my data in column A, I'll type in my sixteen data values. You first want to highlight what's already there and click the delete key to get rid of it. Now let's type the data values, five, push enter, six, push enter, six, enter, seven, enter, nine, enter, eleven, enter, thirteen, enter, sixteen. Make sure you're pushing enter after you type in each data value. If you just pushed the down arrow key, the spreadsheet will not register the data value you typed in, so you have to push enter. Do not use the arrow keys to input the data. So I'm almost done typing everything in. You get a minimum value of five, quartile one is eight, the median is seventeen, the quartile three is thirty-six point five and the maximum value is sixty-two. There's even a calculation for the inner quartile range, but we're not really worried about that. So I have my five number summary, it doesn't match with what we received or what we found, it sure does. So now we're going to draw our box and whisker plot, so we're going to draw our kitty cat. So make sure you first start off with your number line, I need to go all the way from five up to sixty-two. So I'll do like five, ten, fifteen, twenty, twenty-five, thirty, thirty-five, forty, forty-five. I'm going to label all the way until I get to sixty-five. Cutting it close. Alright so you first want to plot, just put a dot at five, put a dot at somewhere around eight, yeah five and eight are really close, a dot at seventeen, a dot at thirty-six point five and a dot at sixty-two. So if you take your first quartile and your third quartile, draw a rectangle that goes through them. Through your second quartile or through your median, that's Q2, draw a line that goes through the median. And then from the rectangle draw little lines or whiskers going out to your maximum value and your minimum value. And that's how you draw a box plot or a box and whisker plot or a box and whisker diagram, whatever you want to call it. So nice pretty picture right there for you. So once again we can use technology to find the five number summary and then we can create our box and whisker plot based on that summary. Our next top will be percentiles, which divide a set of data in two hundred to have minimum value, maximum value, and ninety-nine percentiles in between. They may or may not be a part of the data set. So the median is the fiftieth percentile, remember that's the quartile too. The first and third quartiles are the twenty-fifth and seventy-fifth percentiles respectively. And I already said there's ninety-nine percentiles. They are represented using capital P with the subscript, P sub one is the first percentile, P sub two is the second percentile, P sub ninety-nine is the ninety-ninth percentile. So if you see for instance P sub thirty-five, this is called the thirty-fifth percentile. It's the data value which has thirty-five percent of the data less than or equal to it. So that's the definition we are going to use for percentile in this course. So a piece of thirty-five means thirty percent of the data less than or equal to it. Some books don't do the equal to part, it just depends on who you're dealing with. So less than or equal to it. If I ask you to calculate a percentile, we find first this little number i which is called a locator. To find the case percentile for instance if I was trying to find the fiftieth percentile, I take fifty, I divide by a hundred and I multiply by n or number of data values. So if i is an integer, so that's a nice pretty whole number, then the percentile, the data value for the percentile is the average of the data value in whatever your locator number is and the data value after it. So if i is equal to ten, if your locator is equal to ten, then you're going to average the tenth and the eleventh data value. Let's put it in the real terms instead of all this notation. If i is not an integer, meaning it's a decimal, you round the value of i up and the case percentile is the data value in this position. So if you get something like twelve point four, you're going to round that up, that means the percentile, that specific percentile will be the thirteenth data value. That's what we're dealing with here. Do you ever wake up in the morning and just eat a big bowl of data? Well you may say no, but without realizing it some of you may actually do so. So for instance, if you look at Surreal, Surreal is full of data. For instance, I collected forty boxes of lucky charms and the number of marshmallows was counted in each, because honestly lucky charms like that's the best part of the Surreal, the marshmallows. So if I get jipped on marshmallows you're going to be hearing about it. So I collected these boxes, I counted how many marshmallows were in each, and I recorded the information in the table below. So out of all 40 boxes I had as few as 121 marshmallows or as many as 166. Find the 42nd percentile. That's the data value which has 42% of the data, less than or equal to it. So to find the 42nd percentile, what I have to do here is first find my locator. Remember locator is that lowercase i. So you take whatever percentile you're finding, divide by 100, and multiply by the number of data values. So that's literally .42 times 40. And that's actually going to give us 16.8. I got a decimal for my locator. So by the previous slide, i is a decimal, we round it up. Always round up, even if it was 16.1. So always round up, so to 17. I want the 17th data value. So in my sorted data set I want the 17th data value. So these are in rows of 10. So 10, 11, 12, 13, 14, 15, 16, 17. The 17th data value aka the 42nd percentile piece of 42. That's the notation for it is 143. Let's try another one. The 25th percentile 25 divided by 100 times 40. So that's .25 times 40. Or 10. Please beware. This does not mean you want the 10th data value as logical as that may seem. That's not going to quite be enough to be your 25th percentile. Because remember the 21st percentile needs a quarter of the data below it. So 10 data values below it. So instead you have to find the average of the 10th and 11th data value. You have to find the average of the sorted data set, the 10th and the 11th. 130 plus 133. So p sub 25, the 25th percentile will be 130 plus 133 divided by 2. Average the two numbers. That will actually give you 263 over 2, which is actually 131.5. So that is your 25th percentile. 25 percent of the data values or 10 data values will be less than or equal to 131.5. What about the other way around? What about I give you a data value, you tell me what it's percentile? It's like when you used to take a standardized test or something. It'd be like you were in the 60th percentile. Well, what did that mean? It mean you performed better than at least 60% of the other people that took the same test. So the percentile value of a data value x is number of data values less than or equal to x divided by total number of data values times 100. So I'm looking at less than or equal to. So find the percentile of a box with 145 marshmallows. So you want number of data values less than or equal to number of data values less than or equal to. I'll use the sign there less than or equal to 145. How many data values is that? Well, that's there's 145. There's 10 data values per roast. There's 20 data values because some textbooks treat a little bit differently. Like if you have three and 145s they only count like them as halves. So 1.5. But we're going to count them each as individual data values. So three and then you have 17 other values. So 20. So my percentile is going to be 20 out of 40 times 100. So 0.5 times 100. That would be 50. That would be the 50th percentile. There's 50 percent of data values that are less than or equal to 145. So like I said, that's different books have different definitions, but that's the one we'll use. Anyway, thanks for watching.