 In this video, we will continue discussing about chart types. So, we saw bar charts, tag bar chart, byte chart and histogram. Let us look at what is box plot, scatter plot and line chart in this video. Box plot is used to provide a summary of one or more numeric values. For example, in last video, we saw 60 students marks represented as a histogram. We can use the same students marks and you can represent using the box plot. So, but box plot gives more information and it gives the feel of the data. So, here is the sample box plot using the data we discussed in the previous slide. So, here there are, here there is a minimum value. This indicates, this color indicates marks in a subject A and this indicates marks in a subject B, subject A and subject B. There is a minimum value 43 and it is a maximum value 98. So, the range is 43 to 98. That is a range of the marks from the students performance. And you see there is a 21st, 25th percentile that is Q1. This is the 25th percentile and this is the 75th percentile and the median is the 50th percentile. So, the 25th percentile is this value. This is the lower, the edge of the box and the upper edge of this rectangle is the 75th percentile. The percentile indicates there are 25 percent of values in this performance will be below this particular value. That is we have 60 students 15 students would have got mark less than 56.25 and there are 15 students would have got mark more than 75.25. That is a good thing. And this length indicates the distance like a distribution 56 to 43 is very less, but there is a distribution like deviation is more here. And this middle line indicates the median value, median is 64. For 60 students median will be the value of 30 and 31st students average. If you arrange the numbers in ascending order, you pick the middle number, it is a median number. So, that is a median value. So, median is the August 50th percentile of the mark. That is when we say we want to see the median score and select the students above the median score, which means you are selecting the 50 percent of students in your class. And the median is the Q2. So, this is Q1, this is below this is Q1, this is Q2, this is Q3 and this will be Q4. So, below this is Q1, above this is Q, the fourth quarter this is Q3, this is QQ2. So, what is the average? Average is this cross mark. You can see, can you see the cross mark? This cross mark is the average, that is a 55.37 in this particular course and this subject the average is 61 marks. So, this box plot provides you the minimum value, maximum value, how this marks is distributed for all across all the students. For example, with average median value and there are a lot of students in the fourth quartiles that is 25 to 98, but the deviation is more here. But lot of students, 15 students in 56 to 43, there are less deviation there. Also, we can use the same box plot to compare the marks in a second subject. For same set of students, the marks in a second subject can be viewed here. There the marks is almost similar. The 40 is the minimum and 91 is the maximum and 52, 56, 70, 75, it is comparable and it is also equal. So, the performance in subject A and subject B is equal. Or you can use the box plot to compare the performance of two classes. Say, if you are teaching a same subject to two classes, class A and class B, and you can conduct test and that marks can be plotted in a box plot and you can compare why one particular class is doing better or not, you can know from this figure. So, by box plot, you can get the distribution of data, not like a histogram, but you can get the distribution of data. And you get the details like a range, deviation in each course, everything can be seen from the box plot. In a box plot, we can also have outliers. For example, in this particular example, I had a three marks, I changed the three marks to 2, 5, 1, 7 or 2, 5, 1, 8. So, this is 2, this is 5, this is 8, the middle one is 5. So, I added three marks because well below the minimum rates, that is 43 and 40. So, if we are outliers, the box plot will indicate this because the difference from 43 to 8 is too huge. If you want to include that, the deviation will be really huge and this will reduce very small like 55 to 54. So, if you remove that as outlier, it will not change much. So, they removed these three marks as the outlier and the outlier can be seen here. Similarly, if you have maximum mark 98, only one student above that. So, suppose consider that if we have a chart here about 80, only two or three students got around 91, 92, they will be like outliers. When I was talking about processing the data, I mentioned that you have to be careful on outliers. So, outliers should be very careful on outliers. So, in some research, you might want to remove the outliers data or you want to consider why these students are outliers, why there was extra marks for the students, why they are not able to perform well. There might be these students who are not able to do good in exam because they were not feeling well during that day, but the attendance is really good. If you want to create a correlation between attendance and marks, you do not want to use these marks. That is why the outliers can be removed because if you know these students did not perform well, not because of they did not attend the class or did not understand instead they are not feeling well, so they are not able to sit in the class or something like that. So, you should be careful on outliers. So, box plot is the one plot which tells you outliers, the other more charts which can tell you outliers, but box plot is the easiest one to give you the feeling also the outliers. Also, the data can be added on the box plot. It is not just that you can create the sense of data in a rectangle box or box plot. Also, you can have a data around this. So, each data plotted will be like it looks like this. So, the data around this will look like that. The distribution of data everything can be plotted in the same chart to get a more feeling or more sense of the data. It is not required, but it is just you can add the data on the box plot. Let us look at the scatter plot. Scatter plot is plotted across 2 or 3 axes, usually 2 axes is easy to compare. It is to understand the relationship between 2 variables. It is not just a distribution also to understand the relationship between 2 variables. Example plotting the marks in a course A and course B, we saw in a last slide, course A and course B marks has been shown in the box plot. I am plotting the marks of course A and course B that is subject and subject B in a scatter plot. Here I am not using all the 60 students data obviously. So, because that will that will not lead to a good figure. So, I just removed most of the data. I kept only 15 data. So, the scatter plot shows here that the student who got, so marks 40 something say 45 or something who got around 25 marks in a subject B. Similarly, the student who got say 65 or something who got around 35. But 80 marks, student who got 80 marks who got around 75 or the students who got 65 also get a mark around 875. So, if the student who got 65 in a subject A got around 75 marks in course B. So, why we are plotting this scatter plot? I want to understand the relationship between the marks in course A versus course B, whether the student who can do well in a course A also can do well in course B or the some student who can really do good in this course say 45 but is not able to do well in the other course. Some student who is doing okay in this course but he can do up to 90 marks in the other course. So, we might want to know why these students are not able to do well in this course or why this student is not able to do well in this course. So, to understand the relationship between these two variables that is marks in course A and course B we can use the scatter plot or you can do a scatter plot for attendance versus performance or scatter plot of students, engagement with the performance, lot of other variables. So, let us move on to the next chart, line chart. In line chart, lines represent a trend of the variable over time. So, the line chart you might have seen it everywhere, its line chart is very common and it is very useful to track the multiple variables over time and it is very easy to understand that is why it is used most commonly. Let us look at the line chart of average absentee rate from grade 9 to grade 12 across boys and girls. It is not over time, it is over different grades. Line chart is not always which be over time, it should be can be like to check the percentage variance over different grades. We saw this similar chart in a stacked bar chart also in the bar chart example. So, the same value can be plotted as the line chart. In this chart, it is easy to understand that girls average absentee rate is always below the boys average absentee rate and boys average absentee rate is increasing for every grades, girls and boys both. So, you saw like a box plot, line chart also scattered plot. So, what is the difference between box plot and line chart? Also, can it is done when to use box plot instead of line chart? Line chart is used to track progress over time and box plot is to provide a summary of data at one particular time say class A students marks or grade 9 boys students attendance rate or something like that. And box plot will give more sense of data like what is the minimum range what is the maximum value or the distribution of data in the particular grade or particular class. But if we can combine both, there will be a chart like we can combine both line chart versus and the box plot. For example, grade 9, we saw there is a 2 marks, so 2 values. You can have the box plot here, you can have a different bar part here something like that. So, for grade 9 there is 2, so you can have a lot of box plot also with the trend change also can be shown using the line chart. That will be more helpful to understand there is a trend between each grades and the boys and girls absentee ratio observed also you can get the sense of data in each class. So, you can combine this kind of basic charts to create a new chart. So, it is not that you have to use only one chart for representation, you can combine this chart and make a new chart which makes more easy to understand and more sense of data and more inferences from the data as possible. So, in this video, we talked about box plot, scatter plot and line chart. Thank you.