 Hey. Welcome to another week. Today's session, we are going to discuss courthouse, which are also measures of dispassion. We call them courthouse. And with the courthouse, we are going to learn how we can create what we call a box plot from using the five number summary of your numerical data. And some of those five number summary are made up from your courthouse. And then I'm going to show you for those who prefer or who find the method of using courthouse is difficult. I'm just going to also show you how you can use the percentile to still get the same answers because courthouse divide your data into four equal parts of the new five percentiles. Okay. At the end of the session today, we will cover, like I already mentioned, how to find courthouse or how to find percentile, how we identify the five number summary, and how do we construct a box plot. And there will be some activities that we can all do in order for us to practice or learn from what the concepts we have just gone through. Courthouse, like I said, measures of dispassion, they split your data if you have a numerical dataset. We're still continuing with numerical dataset. It splits your numerical dataset into four segments or four parts. Your data needs to be sorted. You cannot just split your data into four parts while your data is scrambled. You need to sort your data in an ascending order. And when using courthouse alone, it can be misleading because remember, you are cutting the data, you are slicing the data into four equal parts or four equal groups of data in each. So with courthouse, it splits it in 25 percentile packets. So it will put the first 25 percentile of the data or the first group, 25 percent of the data group will form below a certain threshold. The other 25 percent above that 25 will form part of the 50th below a certain threshold. Then you get the other 25 percent, which form part of the 50, which becomes a 75 below a certain threshold. And then all the parts will make up 100 percentile. So when we split those data into those different parts, we create what we call courthouse. That's why hence the word courthouse. So the first courthouse consists of the data that has only 25 percent of the data falls below that quartile. At the quartile two, which is also similar to your median, 50 percent of the data falls below that, and 50 percent of the data falls above that. At quartile three, you will have 75 percent of the data falling below that quartile three and 25 percent of the data falling above that. And those are the courthouse. How do we get those courthouse? Like I mentioned, quartile three is the median. We remember when we were doing measures of central location, we spoke about the median, and we said the median is the middle number. And in order for us to find the median, we need to first find the position. Similar with courthouse, we have to find the position of the courthouse before we can go and locate the quartile values. And I'm going to show you how to do that. So in order for us to find the quartile value, you must understand this, in order for us to find the quartile value, we need to find the position from the ranked data. So it means our data needs to be sorted in an ascending order from the smallest value to the highest value. To find the quartile one, or what we call the first quartile, we use the formula q1 is equals to the number of observation, which is your n plus one divided by four. Remember, because we break down the data into four parts. So we go into divided by four. To find the second quartile position, we use the same formula as the median, which is q2. It's given by n plus one divide by two, because with the median you are having, you are dividing your data into two equal parts. So therefore, you're going to divide by two. The third quartile, we're going to multiply because we're looking at the three third. We're going to say q3 position will be at three times number of observation plus one divide by four. Once we have found the position, then we can go on to our data set and locate the value. Now, here is the thing. There are certain rules that you need to remember always. Remember with the median, if it's point five, we were going to find the two values that it's located between and we divide by two. We add them and divide by two. So we take the average of that. We're still going to apply the same principle. If it's a whole number, if the position is a whole number, or integer, we go to that counting number. If the position is in a fractional, a non-fractional, I'm going to show you what we mean. So when we go and locate the data from the ranked data set, if the result of your quartile position is a whole number like one, two, five, 15, 20, 25, those are whole numbers, you're going to count up to that number and that is where you're going to find that quarter, whether it's quarter one, quarter two, or quarter three. If it is a fractional half, which means the answer ends up with a point five, which can be 2.5, 3.5, 7.5, 20.5, 30.5. As long as it ends with a fractional half of point five, we're going to take two values that the position is located between. If it's 10.5, therefore it's located between the value of 10 or position 10 and position 11. So we're going to go and count up until position 10 and position 11 and take the two values at those position and them together, divide them by two. So we're going to take the average of those two values and that will give us the quartile value. If it is a non-fractional, which means it ends with a point 25 or a point 75, then the following will happen. If it is point 25 because it is far away from the other values but closer to the other value, so what do I mean? If the answer is 2.25, it is far away from three but closer to two. Therefore, we can apply the rule that we always know rounding down or rounding up. So when it's point 25, we round down. When it's point 75, we round up because point 75 will be closer to the bigger value. So if it's 2.25, we round down to two so the position will be on two and we're going to count up until we get to position two and we get the quartile value. If it's 2.75, then we're going to round up because the values are closer to three but far away from two. So we're going to round it up to three. So point 75 rounds up, point 25, we round down. I hope that makes sense. Let's go back to our data set that we have been working on. Remember, our samples survey data on the quality of statistical facilitation session. We know that we can classify the variable or the data that is in front of us, whether it's numerical data or it's categorical data. Today's session, we're continuing to work with our numerical data, which is the age. So let's locate the quartiles using our data set. So we remember the raw data is not in order. We know that there are 20 records on there. So we're going to sort the data from the smallest value to the highest value. So our smallest value is 18 years old and the highest value is 45 years old. Okay, so from here, we're going to find our quartile one, quartile two, quartile three, and we're going to also answer some of the other questions later on. So let's locate quartile one. We know that with first quartile, the formula is your Q1 is given by, we need to find first the position. It's given by n plus 1 divided by 4. We have 20 data points. So we say 20 plus 1 divided by 4, which is 21 divided by 4, and we get the answer of 5.25. Remember the rule set? If it is a non-fractional 0.25 value, we round out. So therefore we're going to estimate that the position of first quartile is on the fifth position. What we're going to do is we're going to start counting from the sorted data, one, two, three, four, five, and that is our quartile one value. And, sorry, I clicked too quick, and that will be our quartile one value and our quartile one value is equals to 26. To find quartile two, remember it's the same thing as your median. I'm not going to stress too much on the median because we covered the median. The formula is n plus 1 divided by 2. We take 20 plus 1 divided by 2, and it is on position 10.5. So it's between two values. So we're going to find our quartile value between two values because it's 10.5. It's a fractional value. So it's located between the value of 10 or the position 10 and position 11. So we're going to count one, two, three, four, five, six, seven, eight, nine, 10, 11 is between 28 and 28. We need to take an average. If 28 and 28 were different numbers, we're still going to take both of them, 28 plus 28, add them together, divide by two, and that will give us the median or quartile two value, which is equals to 28. If this was 29, so we will say 28 plus 29, divide by two, and we will get our quartile two value, which would have been 28.5, probably. To find the third quartile, we use three times n plus 1 divided by 4. So it's three times 20 plus 1 divided by 4, which is 63 divided by 4, and we find the value on position 15.75 because it's a non-fractional value, we're going to round up because the values are closer to 16 than they are. I made a mistake. This should have been 16. This is a mistake. We round up, so it will be on position 16 and we're going to count. So hence probably my counting might also be wrong. One, two, three, four, five, six, seven, eight, nine, 10, 11, 12, 13, 14, 15, 16, and my thinking is wrong. So there is our quartile three value, and that changes everything on my slides. Bear with me. Therefore, it means everything I'm going to be displaying on my slides is going to be totally wrong, especially when we look at quartile three, because I use 35 instead of 36. So quartile three, it's on position 36. I'm going to quickly do this in order for us not to be confused, and it might take me longer. Okay, I'm just going to ignore that and continue. You all know that it is 36. I'm not going to change my slides right now while I am busy. We'll just amend them as we go along. Okay, so we know that it is on number 36. That is our quartile three. So now when you write your assignment, or when you are in the exam, always bear in mind the following. The questions might be, or they might ask you, what is the quartile position one? Or what is the first quartile position? What is the first quartile value? Or what is the median? Or what is the median position? Or what is the third quartile value? What is the first quartile position? When they ask you about the position, do not use the rounded off position. Use the value that they give you, that you get as an answer, because that is where the position is at. When we go and locate the value, that's when we're rounding off. But sometimes they might be very tricky and they might give you the rounded off position. But usually it will be the position, which will be the value that you would have calculated using the formula. The value or the quartile, when they say what is the quartile, or what is the first quartile, they are asking you to find the quartile value. If they are asking you to find the position, they will tell you what is the first quartile position. You must bear that in mind. Let's use percentile to find the same information. I'm going to do this all day long. So what are the percentile? Remember, same. When we use quartiles, we are dividing the data into 25 percent of the data. So it means for quartile one, we can use what we call percentile one, which also can be recalled as 25th percentile, which is the same as 25 percent of the data that falls below that percentile. And that we calculated by using 0.25 times n plus one. We always add one to the value of the observations. And the position will be on 0.25 times 20 plus one, which will be equals two. If you have a calculator, you can tell me what the values are. I don't have my calculator close, but I have this daily one. Let's hope it will not disappoint me. 25 times 21 equals 5.25. So the answer is 5.25, two-fifth position. So it's the same. We can estimate it to say the position. It is at position five or the position fifths. And we can go and count one, two, three, four, five. And that is our 25th percentile of the data. Similar with quartile two, which contributes 50 percent of the data. So percentile two, which is p50, contributes 0.5 times n plus one, which will be 0.5 times 20 plus one, which will be equals 2.5 times 21 equals 10.5. 10.5. I'm not going to rewrite this. It's between position 10 and 11. And we just count one, two, three, four, five, six, seven, eight, nine, 10, 11. It's located between 28 and 28. So we take the average of the two and the percentile is 28. Let's look at the third quota, which is p3, which also contributes 75 percent of the data, which will be below 0.75 times n plus one, which is 0.75 times 20 plus one, which will give you 0.75 times 21 equals 15.75. I'm sorry. Sorry, Lizzie. Yes. Are you writing something? I see your cursor moving, but it's not writing on the screen. It's not writing. Oh, I'm so sorry. And now are you able to see? Yes, now I can see. Thank you. So sorry, blackout. Okay, no problem, but you were listening. So I'm going to assume that you had what I just said. I was writing everything that is written in red, in a bread pan. Okay, so we were on 75 percentile. It's on 15.75, which we know that when we count, it is on position 3.5. Oh, 16, position 16. Therefore, it is percentile basis. Okay, so that is how you can either use the percentile. If you think that you won't remember the n plus one divided by four, the n plus one divided by two, three times n plus one divided by, then you can use this. You can use the percentile. Remember, it's 25, 50, and 75. They create the three quarters. Are there any questions? If there are no questions, then we move on to what we call the interquartile range. Remember, when we dealt with measures of dispersion, we spoke about the range of the data, which is the smallest value to the highest value. The difference between the two gives us the range of the data. With the quarters as well, we can calculate what we call an interquartile range, which tells us the range of your quarters. And remember, the quarters are just quarter one, two, and three, so the range will be between quarter three and quarter one, the smallest quarter and the highest quarter, which will measure the 50 percent spread of your data, where the data is located. It just gives you the mid spread of your data based on the quarters. It is also another measure of variability or dispersion, which it is not influenced by the outlier because the outliers will be on the outside. We're concentrating only on the box or on the two quarters, quarter three and quarter one data. So how do we calculate it? Like with the range, it is the difference between your smallest and your highest quarter. From our data, which gosh, I've got so many mistakes on my slides today. And that is the benefit of making the slides in the morning, the morning before and the morning after. The morning, you will make lots and lots of mistakes. That should be 36. Okay, you get it when the wrong side of things. They should be 36. And I'm just going to move and leave the thing alone. Okay, we are back. So we calculated the quarter, right? Quarter one and quarter three, we know that quarter three is 36 and quarter one is 26. To calculate the interquartile range, we say quarter three minus quarter one, which gives us 36 minus 26, which is equals to 10 years. And we can interpret the data set that said the range of the middle age, half of the age for the respondents of this survey in the sample was 10 years old. That is the range of the quartiles, right? From the quartiles and the data set, we can create what we call a five number summary. And a five number summary is made up of five numbers. It includes the quarters and the minimum value and the maximum value. So your minimum value of your data, because your data is sorted, you're just going to take your minimum value and your highest value. And that is called a five number summary. So it's your minimum, quarter one, quarter two, which is the same as the median. You also always need to remember this. The reason why I'm also repeating it so often is that you should not forget that quarter two is the same as the median. So sometimes they will ask you to calculate the median, which is the same as the quarter two. Sometimes they will ask you to calculate quarter two. Sometimes they will ask you whether quarter two and the median are the same thing. But you need to remember that quarter two, which is the median, quarter three, and then the maximum value. Those are the five number summaries. From the five number summary, we can draw what we call a box plot. So let's look at our data set. We know this is our data set that we have. 18 is our minimum value, 26 was our quarter one value, 28 was our median, 36 was our quarter three. And from there we can draw what we call a box with the plot and which is just, what I just said, it's a box. Or we also have a maximum value, which is 45. And from there we can draw what we call a box whisker plot. A box is the quartile values. This line will extend to there. And the whiskers will give you the 25% of the data. And the tier drop of the whisker tells you where your data ends. If there is another data set here, or let's say there was a record of the line, which is not part of this, we will represent it with a dot because it's an outlier. This will be an outlier, or what we call an extreme value because it's very far from the rest of the other value. For example, if we have another value here, which is 100, which is far from the other value, we'll just represent it with a dot. And we will also call it an extreme value on this side as well. But this is what we call a box whisker plot. And that, on that note, that concludes my session today. But before I conclude, before we do some activities, from the quartiles as well, you can also see the distribution of your data. It can tell you whether your data is symmetrical or your data is cute, because the influence of the median in terms of if your line comes yet closer to 25%, therefore it means your data is negatively skewed. If your median value, if this point, the line comes here, therefore it means it is negatively skewed, then it means your data is negatively skewed. If I look at this, it might mean that my data is symmetrical because it's somewhere in the middle of the values. And if I calculate the mean of this value, probably it might be 28. I'm not sure. You will need to test it if you need to find out whether this data set is symmetrical or not, based on this, because it's not that clear. So in order for you to test whether your data is left skewed, symmetrical or right skewed, remember left means negatively skewed, right means positively skewed. In other words, the tail is to the left. The tail is to the left. The tail is to the right, if it's negatively skewed. So you can either use the median, which is quartile two and the smallest value, the difference between the two, the difference between the median and the smallest value. If it's bigger than the difference between the largest value, which is your maximum value and the median, then your data is left skewed. Or you can use quartile one and the smallest value, the difference between the two. If it's greater than the difference between your largest value, which is the maximum value and the quartile three value, then it means your data is left skewed, same. So you can go through all of them to check whether your data is skewed or not skewed. And that is the relationship among the five number summary and the distribution of your quartiles. And that concludes me talking you doing the work. So let's go and do activities and see if you still remember what we discussed, even though with my little bit of errors on my slides, I hope it didn't throw you off. Let's look at the activities. So the first question, I am going to do the first one with you, probably the second one as well, but you will have to do the rest of the activities by yourself. The daily consumption in kilowatts per hour by a sample of 10 households is 51, 50, 47, 33, 37, 43, 61, 55, 44, 41. Those are the daily consumption of those 10 households that they surveyed. Which one of the following statement is incorrect? Remember, it's multiple choice question. It means we need to evaluate each and every statement, statement one up until statement number five. Statement one, which is option one, says the position of cartel one. Therefore, it means they are saying go and find n plus one divided by four and check if is the same as 2.75. Or use 0.25 times n plus one to go and find out if it is 2.75. Number two, it says the median is 40. Then it means you need to go and find the position n plus one divided by two. Once you have the position, then go find the value because that's what they said. They didn't want just you to find the position they want you to find out which one is the median value. You can also use 0.5 times n plus one. The value of cartel two, they're also looking for the value of cartel two. I'm not going to repeat the same thing as I've just said. Previously, I told you what cartel two is the same as the median right. The range of the data, remember what the range is. If you forgot, highest value minus lowest value, not the inter cartel range yet. They're asking you to find the range of your data. The value of cartel three, they're looking for the value. So we need to first find the position by using three times n plus one divided by four. Or you can use 0.75 times n plus one to find the position. What must you do first before you even go and find the position? What is the first step? Sort the numbers. Let's sort the numbers. I hope you have already started sorting your numbers. I'm going to write the numbers in silence. I've sorted my numbers. Are we good? Let's do option one now. Option one, we just substitute the values. One, two, three, or with the R10. I don't even have to go and count how many there are. So 11 divided by four, it's on position 2.75, which therefore it is the correct position. Or you can just use that formula. You will still get the same. The median is 40. So we're going to find the position first. So I'm going to use the site. So it's 0.5 times 10 plus one, which is 0.5 times 11, which is on position 5.5. It means what does that mean? If it's on position 5.5, anyone? It's between 5 and 6. So one, two, three, four, five, and six. So it's between 44 and 47. What we need to do is we need to add 44 plus 47 and divide them by two to get the value of the median. So 44, 47 equals 91 divided by two equals to 45.5. And that one said the median is 40. So this one is incorrect. The value of Quartal 2, we already calculated it. Remember, the median, the median and Quartal 2 are the same, right? I've been saying it. We already found the value of Quartal 2, we said it's on position 5.5, which is 45.5. Therefore, the median is 45.5. The range of your data, the lowest value is 33. The highest value is 61. So it's 61 minus 33, which is equals to 28. Therefore, it means that one is correct. Quartal 3 value. So we first need to calculate the position. So I'm going to use the percent type, which is 0.75 times 11, 10 plus 1, which is 0.75 times 11, which is 8.25. What must I do with 8.25? Do I round up or round down? Round down. We round down. Therefore, it's on position 8. So we're going to count 1, 2, 3, 4, 5, 6, 7, 8. It's 51, which makes that correct. Easy, right? So if you get questions like this in the exam, it will be big and easy to calculate. The data below shows the production cost in dollars in thousands of US dollars of 10 autonomous vehicle at Tesla Inc. Calculate the interquartal range for the cost of production and choose the correct answer from the list. Now, what you need to always remember is the formula. So IQR, I'm going to write it here. IQR is equals to your quartal 3 value, not the position, value minus the quartal 1 value. Therefore, it means we have to do certain things. We first need to calculate what quartal 3 is and calculate what quartal 1 is. So quartal 3 is 3 times n plus 1 divided by 4. n plus 1 divided by 4. We need to also sort the data. So this data, looking at it, it's in ascending order, right? We wish everything in life was come so easy like this. So if I'm looking at this data set, it is in ascending order. So already my data is ready. I can just do the calculation. 3 times 10 plus 1 divided by 4 because they are 10. They told us they, or you can count how many they are. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. And we calculate. So it will be 3 times 11 equals divided by 4. The answer is 8.25, 8.25. Because it's 8.25, we're going to round down to 8. 1, 2, 3, 4, 5, 6, 7, 8. And our quartal 3 value is 2, 2, 8, 6 minus. I need to go find my quartal 1 value. She's 10 plus 1 divided by 4. She's 2.75 position. What must I do? We round up. It will be on position 3. 1, 2, 3, which is 1, 2, 6, 3. And the answer is 2, 2, 8, 6 minus 1, 2, 6, 3 equals 10. 23. And that will be option E. That's how easy it is. As I give you another exercise to do. Let's see if you can answer this one on your own. Consider the following data set 4, 14, 6, 9, 23, 3, 7, and 10. Which one of the following statement is correct? The median of the sample is 8. The first quartal is 4. The third quartal is 14. The mean is 9.5. The distribution is symmetric. The first step you need to do is sort the data. And of sorting the data, you're going to find the position because the median position is the second quartal. It's n plus 1 divided by 2. The first quartal, they are looking for the values, not the positions. So I'm going to give you the formula for the positions. You are going to find the position and then go find the values. n plus 1 divided by 4. And the third quartal will be 3 times n plus 1 divided by 4. The mean, we know the mean is the sum of all the values divided by how many there are. Symmetrical, it means the mean should be the same as the median if they are not the same. So check, this is your check. Check if the mean is the same as the median. The mean is on option 4. The median is on option 1. If they have the same value, then it means they are the same. They are symmetrical. Which one of the following data is correct? Are we winning? I think this question actually was looking for which one is incorrect. There must be some typing error there as well. Which one is incorrect? Okay, are we done? Let's see. We sent the data. So this will be 3, 4, 6, 7, 9, 10, 14, 23. So the median, it's 8 plus 1 divided by 2, which will be on position 4.5. Just double check your data. So position 4.5, 1, 2, 3, 4.5 is between 7 and 9 divided by 2, which then gives us, did you get that correct? And the first quarter, it's on position 8 plus 1 divided by 4, which is on position 2.25. Did you get that right? Therefore, we're going to estimate that it is on position 2. 1, 2, it's 4. And that is correct. The third quarter, 3 times 8 plus 1 divided by 4, which is 9 times 3 divided by 4, 6.75. Did you get that? We estimate it is on position 7, because we round up. 1, 2, 3, 4, 5, 6, 7, it's on 14, which is correct. The mean is the sum of all the values. So our mean is the sum of all the values divided by how many they are. So if we add all the values, we get 76 divided by they are 8. 76 divided by 8 is 9.5, which is correct. Let's check. Our mean is 9.5. Our medium is 8. They are not equal. Therefore, they are not symmetrical. Then it means our incorrect answer is 5. That's why I said that it's just the typing error. And you will find certain questions like this in the assignment or exam way you get typing errors. Do not panic. Send your email to your lecturer or your E-tutor or your E-tutor to alert them so that they can check and correct them sometimes. Somebody is online with me and we fight him for it. Can I just ask one question? Yes, you may. That first one, the median of the sample is 8. Can you just go over that? I've got up until 4.5. I don't know. I just hit a blank where you got that 7 plus 9 divided by 2. So you count. So we got the position of the median, which is 4.5. So we count 1, 2, 3, 4.5. It's between 7 and 9. So it's between two values. So we're going to take the average of the two values. 7 plus 9 divided by 2 gives you 8. Thank you. Thank you. Yes. Okay. I do have additional activities. There is one. You can take a screenshot that and there is another one. You can also take a screenshot. Remember also if when you do this activities on your own, if you are struggling, we have a WhatsApp group. You can always go back there and ask questions there for assistance. Use the group to help you study. There. I'm not going to go through that for the last time. Introducing Pambili Analytics, KTCU of the free sessions. We are here to support you in your journey as you study to complete your qualification. But as Pambili Analytics, we actually our mission is to bridge the gap in terms of data literacy and analytical skills and statistics is one of those. We also offer a range of services. I'm not going to touch on those. But also we do offer skills development training as well as tutorial support, which we call them instructional led support where we offer virtual sessions like the one we having which are free version. But you can also request for one-on-one. Our rate is 150. It's 150. It's still on special probably until June and we revert back to our normal price. If you want to learn more about data literacy, research methodologies, research designs, you can take one of our self-led online sessions. Or if you want to learn about programming in our or in Python or creative visualization, business intelligence, you can take our self-led online training. Remember to also always to subscribe or share or comment or like our videos on YouTube. We've got the free version and also we do have where you can see all these free online versions. And I would like to also give a shout out to our loyalists. We do have a couple of people who support us, but we do have those that are our loyalists. And we want to appreciate them for today's session. And those is Ojibambo and Noxklenwa. Thank you very much for your support and our supporters. Remember you can also become our loyalist by just paying a subscription of just 79.99 and receiving the video access and the notes. Other than that, see you on Monday for those who signed up for Monday's session. On Monday, we're going to start with probabilities. Bye. Enjoy the rest of your weekend. Thank you, Lizzie. Bye. Thank you, Lizzie. Thank you so much. Thanks, colleagues. Bye.