 Today, we are going to discuss the classification algorithms based on distance. At the end of the session, the student will be able to demonstrate various classification algorithms that are distance based and we will be able to compare between them. What are these distance based algorithms? They are based on the strategy that each item is mapped to the same class is more similar to other items in that class. So constring a distance, we will see that all items in this particular distance are more similar than items of some other distance in the same class. So similarity also called as a distance measure may be useful to identify a likeness of different items in your database. Using a similarity measure is simpler for classification since the classes are known well in advance. All classification problems define classes well in advance. If definition is in the form of an information retrieval query, the classification problem is to determine the similarity between each tuple and the query. So first, we will use a very simple approach where we assume that each tuple Ti in the database of consideration is defined as a vector Ti1, Ti2 up to TiK, all of enumeric values and each class Cj is defined by the tuple consisting of Cj1, Cj2 up to Cjk also of numeric values. So first we define this particular method given a database D of tuples T1 up to Tn where each tuple Ti is less than or equal to a combination of tuples Ti1, Ti2 up to TiK containing all numerical values and a set of classes C1 up to Cm where each class Cj is again consisting of numerical values of Cj1, Cj2 up to Cjk. So now the classification problem is to assign each tuple Ti to the class Cj such that TiCj combination is greater than equal to some similarity measure between Ti and Cl where for all values of Cl being element of C where Cl is not equal to Cj concentrate on the definition it's all dependent on a similarity measure. So the method for similarity method representative vector for each class must first be determined then we calculate the center for each region and try to find out the distance between the items and the center and we place each item in a class where it is more similar to the center of the class that means the distances are equal. Here is the algorithm where we calculate the centers for each class in the various probably partitions we input the tuples to be classified from your database and we output a class to which this tuple is assigned. The algorithm consists of first assigning distance to infinity and then trying to find out how much it is actually doing this for all the classes that is assigning now the distance associated and then keeping this distance to be less than a distance value Dist and then associating it to find out that particular distance. So we see here that for the class A all the distance from the center of all these particular tuples associated are the same for the class B again distances are the same for the class C again distances are the same so all those particular tuples having equal distance from the center are put into a particular class and then we have three classes class A class B and class C based on similarity or the distance measure. The next aspect that we have to consider is the nearest neighbors where we consider K nearest neighbor to a particular item. So we have certain assumptions the entire training set includes not only the data in the set but also desired classification for each item. The training data becomes now the model for classification of the new item distance of each item in the training set must be determined. So first we are doing a training creating of the model and then we are doing a classification. So the K closest entries in the training set are considered further and a new item placed in the class contains most items from the set K closest items. So which are the closest neighbors are taken into our particular class. So this is how we develop the classification using K and N okay again we see the distance to find out who are the closest neighbors and this K and N method starts with the training data calculating K as the number of neighbors calculating T input tuples to be classified and outputting the classes which T is assigned. The algorithm to classify tuples using a K and N assumes that we have to start somewhere calculate the distance calculate the tuples at that particular distance okay which is finding out all the tuples at a particular distance and assuming them to be similar then putting them into the classes for classification. So all the tuples which are closest then next closest then next closest are neighbors of each other at that particular level. So K and N technique is extremely sensitive to the value of K where the number of thumb rule is K is always less than or equal to the square root of the number of training items. For our references we have used thank you.