 In this video we will discuss what is hierarchical clustering. So, in last video we discussed what is Cayman's clustering that is centroid based clustering. This hierarchical clustering is a connectivity based clustering. There are two major types within hierarchical clustering. One is aglomerative that is called aglomerative nesting like agnus. We will see that in detail in this class. And the other one is DBCU analysis like Diania or DIANE. So, DBCU analysis, DI and DIANE analysis. So, DBCU hierarchical clustering or aglomerative hierarchical clustering. So, for the agnus we build a tree from a bottom up. We will see what is tree, what is the bottom up is. And we will stop when we reach the root. So, for DIANE it is just opposite. We will start from the top and we will reach when we reach all the data points, all the children's in the given sample. So, we will see what is exactly this bottom up and top down approach, what is the difference between agnus and DIANE. So, we will discuss only the agnus in this course. So, it is simple you can understand how the other clustering algorithm works. Let us see hierarchical clustering. The same data set, but I think for some points were removed just to make it simple. The idea of hierarchical clustering is that you have to pick each and every data point as a cluster because it is a bottom up. So, the number of clusters here is 8. All the data samples is a cluster. So, each and every data point will act as a cluster. This is everything is cluster. Then you merge the most similar points. So, these are very similar maybe this can be another cluster something like that. So, you merge the most similar points in nest step, then merge the most similar points in nest step. So, till you reach all of this belongs to one cluster till you reach the last step that is all this belongs to one cluster. So, you make a multiple levels. So, how do we merge the most similar points? There should be similarity measure like last time we saw what is a Euclidean distance to measure distance between centroid and data points to pick which cluster to go. Similarly, here we also have to have similarity distances. Some similarity measure has a various formulas, a lot of formulas. It is not important to know all the formulas here, but understand there is a some similarity function which measures the similarity between the two points. If it is similar, if this point is most similar to this point compared to this point, these two will be merged first as a step one. That is how the similarity measures is used. Let us see how this works in multiple steps. So, the first step what I did, I considered every data points as a cluster. So, we start with eight. So, what I say, it is all the eight of them here. So, all the eight of them are clusters. So, in the step two, we merge the most similar data points. By some measure, these two are more similar compared to these two. So, this is one cluster, this is another cluster, this is a three cluster. So, now we have five clusters, one, two, three, four, five. We have five clusters as the step two. Now, this cluster and this cluster is similar compared to the other clusters. So, this is formulas, another cluster, another cluster. There is two on these two. So, by some similarity measures, consider now we have three clusters, one, two, and this one, the three clusters at step three. At step four, we might have another cluster combining these two. So, now we have two clusters and at step five, we have one big cluster. So, number of steps does not depend on the number of data samples. Instead, it depends on the behavior of data sample. So, it is like eight data samples, it should be five, no, no, not like that. Eight can be converged in six or seven steps or three steps or based on the behavior of the data. So, this is what hierarchical clustering, this is called actually agglomerative clustering. Agglomerative clustering starts with the one data point and tries to create a cluster with neighbors or similar measures. So, this cluster will be described very easily dendrogram diagrams. So, dendrogram is another type of visual representation, which we did not see in descriptive dynamics because we will see this. So, I thought not to discuss that multiple times. What is dendrogram diagram thus is, this is not actual dendrogram diagram plotted from any library instead, I just drew this dendrogram in the PowerPoint. So, let us see how this dendrogram works. Let us see how it works. If we give a numbers to some of the values, let us see, let us assume some numbers 1, 2, 3, 4, 5, 6, 7 and 8. That is the numbers 1, 2, 3, 4, 5, this is 4 and 5 and 6, 7, 8. That is kind of some changes here. This is not exact dendrogram, but let us see how it works. If you consume that 1, 2, 3, 4, 5, 6, 7 and 8, these two are similar measures, there is a distance similar. So, this will be combined as a one cluster. So, this is a step 1. At step 1, these two will be combined and these two will be combined and these two will be combined. And at step 2, and this cluster and this point will be formed as a new cluster. It is a new node, this particular node, there are one cluster and these two is there and these two is there. So, if I draw a line here after step 2, I should have three clusters like one cluster, two cluster, three cluster. If I draw a line at say at here after step 1, I have one cluster, two cluster, three, four, five clusters. Initially, I add eight cluster, then I choose a five cluster, then if I choose one, I have like three clusters, I can say 1, 2, 3. If I draw the next steps, these two are combined to a new cluster at the step 3, then if I draw a line here, I should have one cluster, two cluster, then there is one more cluster, there is only one cluster after step 4. So, this dendrogram tells that how to represent agglomerative clustering in the visual format. So, if you have a data, if you run Agnes algorithm, you might get this dendrogram diagram on your data points and that tells you how this clusters are formed and you can choose which level you want to go. So, I want to say I want to have three clusters, then these data points will be considered as one cluster, this is another cluster and this is another cluster. So, you can create where you want to choose. Each data point is each student ID. So, because you use each student's behavior, student one's mark versus midterm. So, each data point is each student's ID. So, I just wrote the sequential man like 1, 2, 3, but if you really look at the data point, this might be different. So, the three students belong to one cluster, other students belong to other cluster, other three students to other cluster, we are forming into three clusters. So, this is how the hierarchical clustering can be used. So, to implement hierarchical clustering, there are a lot of tools available, we will also discuss some tools for that. So, give the data and compute the hierarchical clustering and you choose your level based on your data and your need. So, hope you understood what is the endogram and what is hierarchical clustering. So, can you list down two drawbacks of hierarchical clustering algorithm after listing it down, rushing the video to continue. So, it is not easy to decide number of clusters, but you have a complete picture like what if the cluster value equal to 1, what if cluster value equal to 2. So, you have complete picture, there is no need to have an elbow method or compute the error function, there is no error function here because but you have complete figure, so you know where to choose. And the distance of the lines also important, I will tell what is that lines in next video. Let us see, there are different similarity functions, one is single connection or single link or average, there are a lot of similarity functions, you should choose a right similarity function and some similarity functions have sensitivity to outliers. So, if there is outlier some similarity function may not work, so you have to understand which similarity function to choose. So, that is one drawback, I did not discuss that, but you can check what are the similarity functions used in hierarchical clustering. And the main problem is time complexity, so n square times it takes at each step. So, if we have say 4 steps, there will be like 4 times n square times it takes, n square number of clusters at each step. If you have like a first step 8, there will be like 8 square 62 times 64, computationally costly happens at the step 1, then it will combine into 5 clusters and 5 square times it takes. So, it will go in a logarithmic scale of complexity, which is highly computationally costly, also time consuming. If a very small data set which you collect from 60 students with 4 or 5 features, do not worry about those things. So, for this 60 students get at 5 variables or 10 variables, this time is not really important. For very large data sets, you want to do a clustering in real time and you want to give feedback in a real time, there the time is important. So, when you talk about time, you also have to consider where you will be using, is it offline or in a real time you want to use this clustering technique, so consider that. So, here I like clustering as a couple of drawbacks, but it is easy and visually appealing, but we do not know how to choose the clusters. So, some places in research we use K-Means clustering, in other places we use I-K clustering. We will see the example of both K-Means clustering and I-K clustering in a search paper in the next video. Hope you understood what is I-K clustering and thank you.