 So, in this week, we saw two clustering techniques from the diagnostic addicts. We will see where it is used in the search papers, which gives some sample search papers. So, for K-Means clustering, I selected this paper. This is in general of computer and computing in higher education. So, here the learners are custard based on their engagement with the MOOC environment. So, that is a question of the clustering patterns of engagement in massive open online courses. So, what is learners' engagement? They measure the learners' engagement in a MOOC environment using four metrics, what is reading frequency, how many times the student read, so the number. And writing frequency, how many times the student write in a forum, that is basically writing and reading in a forum. And how many videos he watched and how many times he attempted the quiz. So, forum reading, forum writing, video watch and quiz attempt, they computed frequency of students interacting with the MOOC. That is why each student, then if you have understudents, their numbers are coming in. Now, there are four variables. What we saw in last class in a clustering example is only two variables. So, you can plot and look at it, but four dimension we cannot do that. So, we can apply the K-Means clustering algorithm. So, they apply K-Means clustering algorithm on all this theater. And they used Euclidean distance and they did K-value from 1 to 15, remember. So, see that K-value based on the K-value that within the group sum of squares, that is the objective function J value is reducing, right, the other function is reducing. But they choose K equal to 4, the reason is we know the elbow is at this point. And also the difference between the error function reduction is actually less from this point, from 4 to 5, the error function reduction is very, very small. Or you can choose K at 6, whether you can choose K at 4 or 6 based on such questions and what the data gives to you. So, here they choose K equal to 4. So, that is exactly why they use K equal to 4. Let us see what the K equal to 4 means. Here, they looked at the students in each cluster. There are four clusters, each cluster gives you set of students. A cluster one will have some 20 students, cluster would have more students, something like that. Looked at the data or looked at the behavior of students in a cluster one. Remember, this is a diagnostic antics. This is not a predictive antics. So, we do not know why a student was getting a low score, a student is not able to clear the quiz. So, we clustered and you see there is one cluster, all the students are doing low activity on all the four metrics, like they are not doing any reading the post or writing any forum messages or they are not watching video or they are not taking attempting quizzes. So, that is why they are not really not interested in the course. So, they are dropout. So, how do we come give the name dropout perfect student? It is up to you because you are the researcher based on the data, based on your domain expertise, based on your understanding, you can give the names. So, they give a name as dropout. So, because all these four metrics are low, so these students are mostly dropout. So, this we can check whether students go on a dropout or not in predictive antics. If you see that, that will be clustering can be classified as a classifier cluster. The second one is perfect student. They are highly engaged in all the four metrics, especially accessing video lectures and reading lot. So, they are perfect students, they might do well and they might continue the course till the end. There are some students who are gaming the system, who are highly engaged in number of quiz attempts and very, very low on video watching, which means they do not care about what is delivered in the content. They were very confident, no, I can of all the quiz, I know all the content here. So, I go and directly attempt the quiz, they are gaming the system. They might be success, they might not be not success, we do not know about that. See, we do not know about the success part here, but based on interaction, because they are gaming the system, we can come up with that. There are some students who are social and who are the ones who are highly writing in the discussion forum, they do not care about other interactions like watching video or reading others. Instead, they are the one who is actually writing in the discussion forums, others are even not even writing. So, these kinds are mostly social, they are not watching video much, but they are kind of social group. So, we are grouping into these four. So, it is up to the researcher to find out what are the metrics to consider for clustering and how to make inference from the clusters. So, the came in algorithm will help you to find number of clusters, that is it. So, the technology will help you to find a, should I go for four clusters, how to group them from the data, but choosing the right parameters to create clusters and making inference from the clusters, that is up to the researcher. That is why we call as a domain expertises needed. So, in this course, I would like you to create that expertise not only how to use the algorithm or apply the algorithm. So, let us look at the example for hierarchical clustering. In this paper, in the LAC 2019, the authors used algorithmic hierarchical clustering to model learner participation profiles in online discussion forums. Again, this is also online discussion forums based on the students participation in the forum, they are creating the clustering using Agnes. They used data in online discussion forums, they classify the data into two groups, reading and writing. So, reading is separate activity, writing is separate activity. From reading, they selected some data points. For example, for writing they used the ratio of the threads, how many times threads has been started by the learner compared to all the threads started in the discussion forum. How many times the learner replied to the post, active number of days, how many days learner logged in in the number of days, how many days he really created thread or he replied to a post or something like that. So, if we actually participated in the discussion forum, that days will be counted as active number of days. Similarly, they created four parameters for writing and similarly, parameters has computed for the reading. So, now they have data for reading separately, writing separately. Using these data and for dimension data for all the students, they computed the agglomerative clustering. Let us look at the clustering values. So, for the writing group, there is a writing group one, there are very few students and there is all as form 21 clusters, there is no good clustering coming out because all are very distinct. And for the writing or W2, there is a characteristic, there are different cluster group and writing to the three clusters comes out. So, what happened is they based on the writing activity, they are clustered into three clusters here. This can be combined further to make agglomerative clustering, but they may left it here actually. Let us look at this height, you know, this height distance is actually tells you how far the distance is from the clusters. If you are finding the similarity measure between two points using one of the similarity measure function, the function they use here is complete link. Complete link is finding the farthest point in the same cluster. So, if you say complete link, the distance between the clusters indicated by the height of this dendrogram, you know, this height actually indicates how the distance from this cluster to other clusters. So, similarly for a reading behavior, they computed the four clusters. Since the lot of students see there are four clusters they created, which means, which means see if you what is four cluster mean exactly is, they dropped at here. This might be combined again to two clusters, again combined to one cluster, they pick false the value. So, they selected these students as behavior one, behavior two, behavior three, behavior four. So, there are four clusters they wanted to analyze. So, is an example of hierarchical clustering used on data. So, you can also apply this kind of clustering algorithms on the data for diagnostic analytics to understand why students behave like that. Since we saw two papers in came in clustering also in hierarchical clustering, we asked the same question we discussed at the beginning of this week's first video. Can you list down two applications of clustering in any learning environment? The same question we asked in the first video. Now, you know what is hierarchical clustering, what is came is clustering, now you saw the application of them in a two papers. Can you list down? After listing it down, let us in the video to continue. There is no response to the previous activity. You have to compare your response to the first video, the similar activity and the this last activity and see there is a improvement or not. If you there is some change and if you understood clustering and if you understand how the clustering can be used on different learning environment, it is good you would have learned clustering. If there is no change, I think you might have already know what is clustering and you already applied clustering in a right environment or we did not really understand the clustering in this week's video. So, I request you to go and watch read content regarding clustering in the online. Thank you.