 Welcome to the fourth lecture of the course on pattern recognition under the NPTEL phase two program. In the first lecture you have heard from Professor C. Murti about the importance, significance and different applications and the purpose of pattern recognition. In the fourth lecture now we will consider an important aspect in the field of pattern recognition. In fact, two aspects and we will try to understand the difference between the concepts of clustering and classification. The difference of clustering and classification can be understood best with an example. Let us start with the example of classification first which is easy to understand and then look at the problem of clustering. If you look into the slide what you now have actually are two images or two pictures of two different flowers. If you ignore the presence of the background if we ignore the presence of background in these images then if you are given the problem of classification to distinguish and say white flower is one class and that appears different from the second flower which is of class 2. Let us say then you have to identify one property, one attribute or one difference between these two images in terms of something which belongs to the flower and that property could be colour here let us say because on the left hand side what you see is an image of a flower and on the right hand side you see also an image of a flower but their colours are different. On the left hand side the flower is of white and on the right hand side you have the colour which is yellowish of the flower. So if I pick up colour as an important property which we will actually specify by the name feature later on and define what is a feature for the time being in a very simple sense let us say I take the word characteristics or attribute or property of the object from which we are solving the problem of classification then colour is a very important feature here. You go back to the slide and if we can extract colour from these two images basically the flowers again I repeat ignoring the background then I can easily distinguish or anybody can easily distinguish between the flower on the left and the flower on the right of course there are a few richer features more than colour like the shape of the pattern of each flower which we will consider later on. Now let me bring a third object which is completely different as you can see that this object is different from the other two it is not a flower of course it is an animal but if you had used or if you use colour as an important feature here you will not be able to distinguish between the left flower on white as well as the animal which is also having a white colour. This example shows that on certain occasions so if you want to distinguish between the three different types of images shown on the screen to you just now colour will not satisfy the criteria of classification between the flower and the animal. It might work in certain cases between the yellow flower and the white animal but between if the flower has also a white colour and the animal also has a same colour in this case white then you will not be able to discriminate between the two. So this example shows that you will need more than one property more than one feature more than one characteristic or attribute from objects from signals which we see which we perceive to distinguish one with respect to the other. We will move forward and see more examples of classification look at this problem you have several set of cars the problem of classification now is to distinguish between different models of cars let us say different models of car correct. If I use colour again you can see that the car on the extreme left is the same as the one which is the colour on the extreme right sample. So you have to use more than one feature may be the shape of the object the shape of the windows the position of the wheels with respect to the car various attributes of shape in addition to the colour to distinguish between different categories of cars which are looking almost similar almost same. I am not trying to distinguish or classify or categorize between flowers and cars cars and other animals but between between cars itself all cars look more or less the same they are four wheels they are front screen doors windows and so on so forth the so the shape and the size almost look similar of course there are other types of vehicles as well but between cars when the objects look very similar the problem of classification can be more difficult look at the next example in the same slide these are four different examples of fingerprints scanned from different individuals and there are experts who can actually distinguish between these four different categories of fingerprints how I will not go into details but there are features extracted from this type of line diagrams as they appear the fingerprints after they have been scanned and digitized and rectified the experts who will find certain patterns like reorientations minutiae points re-openings and closings to distinguish between one fingerprint and other which is a very good biometric trait you can use fingerprint to uniquely identify an individual it has a huge application it is a huge application in various field of biometric access control and criminal identification in the case of foreign six let us look at this as another example of biometric you are seeing here examples of 12 different face images 12 different face images what is the problem of classification in this case it is called a problem of face recognition the problem of face recognition involves trying to identify an individual trying to identify an individual recognize it from the patterns of the face now all faces look similar all of us have two eyes one nose lips ear etc and so forth the hairstyles may be differing there may be other patterns in the face so the variability of the patterns within the face from one individual to the other in terms of eyes nose gravel pixel variation may be structure and something else are been exploited by pattern recognition scientists to design a classifier which can identify an individual under certain conditions it may not work well under all conditions but in certain cases yes it is possible now if you look back to this problem of classification of trying to distinguish or classify between faces between fingerprints between faces and fingerprints and say between cars this problem was a little different from the first easy problem which I showed you between two flowers or between a flower and an animal why the samples are very different the images were different the objects were different one with respect to the other in the case of face recognition all the faces look very identical the could form properties if you extract properties their properties will be very similar similarly for fingerprint similarly for cars or similar objects so the problem of classification becomes difficult if you are trying to distinguish different type of objects within the same category different type of objects within category face recognition fingerprint recognition cars between type of say trying to identify patterns within a signature so they may be very similar and the problem becomes very close to a clustering problem which will understand little bit let us get a little bit more specific with features and classification if you look into this very simple example which has been talked of in many books sorting incoming fish on a conveyor according to species using optical imaging or optical sensing so what you do with the sample fish there may be two classes of fish which we are trying to classify or categorize these are the two examples but you can of course take any other samples of fish which you want it could be the classic varieties of the hilsa and the row which you find in India then what you do with these species you try to extract certain properties that could be used to distinguish between the two type of fishes the length of the fish the brightness of the lightness of the fish the lightness could be the brightness of the intensity the width of the fish number of shapes of the fin position of the mouth with respect to the body etc these are some examples of properties or characteristics which you can measure of course there are few other things which you can measure like the weight of a fish something which you probably cannot measure like the smell or the touch so these are important properties or characteristics which will help us to distinguish one sample of a category from the other in this case could be a two different categories of fish okay so these are some examples as you can see in the slide or example of what are called on the right hand side examples of features which one has to extract so that this helps in distinguishing one class of object from the other one these are all the listed features which can be used but of course you can use a few more if you can extract and then use it in your classifier this is a definition of the word feature which you may find possibly in some dictionary it is a property or characteristics of an object sometimes quantifiable sometimes not and then which is used that is the main purpose which is used to compare or distinguish between or classify two objects I repeat it is a property of an object which is used to compare or distinguish between two objects or classify two objects so hence forth in the rest part of the lecture we will deal with these features we have seen some examples of features to repeat them from the beginning of this lecture color shape intensity size minisher features length width shape features we will see some examples some more examples position of certain important landmarks of the you know and an account of a certain landmarks in images or objects using these features the question comes as comes is how can you do classification and of course the other question is what is clustering but before we go there we will try to see some very important properties or desirable properties of features features must be invariant to translation rotation scale noise and other type of projective transforms this is much to do with images what does it mean in an image if you rotate the object keep the object anywhere in your screen scale the object make it bigger or smaller or if there are noisy artifacts in the image or if you have projective transforms as you can see that if I am going away from you I am appearing smaller this is a classic example of scale variations due to projective transforms a classifier should be able to still recognize my face invariant to where am I compared to the sensor or the camera which is picking me up features are expected to be invariant to all sorts of affine transformations if you go back to the slide some examples are rotation translation scale noise and projective transformation other desirable properties of features it must be distinct and unique for a given shape object or a signal now if you have these two objects one is in usb mouse the other is basically a pen you can see that the features like as for example if you even if you take intensity as an example and and the short of a ratio of the height versus width if you take that as a feature they will be different for two different objects so going back to the slide it must be distinct or unique for a given object the cost of computation must not be very high for a feature because otherwise the classifier will take a lot of time to do the processing and must have graceful degradation due to discontinuities and missing parts that means if the picture is not very clean and there are certain discontinuities in the object which has been images which has been imaged still the features should not show much of variations the feature should not show much of variations it would still pick up similar identical value of the feature from the image these are some examples of features which are using visual patterns you pick up you do pattern classification mostly with images you can perform classification with non-visual signals also we will come to that in a moment but let's look at visual patterns which is one of the most important example of pattern recognition some of these will be explained later on but just let us name them these are something to do with shape features normalize central moments elongation compactness connectivity oil and number they need to sometimes model the background or a foreground texture of an object so you use measures like gray level co-occurrence matrix or glcm you can use filters like gabber filter wavelet filters and so on you can use color histogram gmrf or gaussian mark of random field and these abbreviations which you see are very rich features which have been just you know discovered or worked upon over the last decade or so h o g is histogram of oriented gradients i repeat again histogram of oriented gradients this is hog do g dog is called the difference of gaussian i'm sorry i should repeat again h o g is histogram of oriented gradients do g dog is diff difference of gaussian scale invariant feature transform is sift and surf is speeded up robust features curvature scale space is css chain codes polar signatures corners dft discrete Fourier transform dct discrete cosine transform discrete wavelet transform dwt short term Fourier transforms stft and hilbert transform these are different type of transforms which are used m at is middle axis transformed ltp is local ternary pattern lbp is local binary pattern shape contest very rich features for representing shapes for motion people use optical flow vectors and also super quadrics for modeling blobs when they see it in images these are some examples of visual patterns which you can detect from images pattern recognition area is applied for any type of signal visual as whereas non visual something which you hear say an audio signal let's go back features using other signals these are examples of features which are used in audio music speech like the capsule coefficients lpc linear predictive coefficients spectrogram speech tone peak chroma or harmony then of course you can have model parameters which can do for visual or non visual like the hidden markup model hmm the arma model autoregressive moving average condition of random field crf some most similar to gm rf people also extracted features like rhythm beat and tempo from music and of course you can measure weight order taste fine you can use them as features also that's a huge list of features some of them are not exhaustive some of them why couldn't mention because there's a technology involved in some of them like pca features we'll talk of them later on all the lda or the icf features we'll talk of them as the course goes on with respect to feature we should know what is a feature vector because a single feature is not useful for the purpose of classification and a set of features are used together to perform a classification they form a feature vector so what's the difference between a feature and a feature vector well you can say a group of features are combined together to form a feature vector what is the dimension of the feature vector the number of components or the number of features which you have extracted to form the feature vector let us say you take three different features color the area of an object and say the eccentricity of an object repeat again color area and any arbitrary moment like eccentricity or moment driven parameters like perimeter you can take perimeter area and color let us say this three very three simple features color area and perimeter are the three simple features which you want to use all these features because you know that if I use all these features together I can do a problem of classification much more accurately than picking up any one or two of these so when you take all of these three features together you form a feature vector where the first component of the feature vector is color the second could be area the third could be parameter or it could be something else also doesn't matter so what do you expect this feature vector extracted from a sample belonging to one class or category of an object should be different from the feature vector extracted from another object and the problem of classification would be to distinguish between these two feature vectors belonging to two different class of objects or if possible learn and know from a priori samples as to what this features feature vectors would be what this feature vectors would be for sample belonging to class for samples belonging to class a and how they are different from samples belonging to class b and that is the problem of classification and this will lead us automatically to the idea of clustering as well as for example if you go back to the slide this is an example of a two-dimensional feature we talked about this pair of features to be extracted from fish and if the two features which we extract are say brightness and length forming the two components x1 and x2 as they are indicated in your slide then capital X to your left is a feature vector the transpose indicating that we typically talk of this is a column vector than a row vector so the transpose is given as a row vector then the feature vector will be of dimension 2 one could be brightness the other could be width or they could be something different so if you select or identify a set of 10 feature vectors sorry 10 features from an object if select set of 10 features from an object your feature vector will be of dimension 10 or even larger if you have more number of features feature vectors lie in a feature space this has to be learned it is something like the vector space samples of input when represented by their features together are represented as points in feature space if a single feature is used then you are working on one-dimensional feature space so now you have to learn and you have to know that a feature vector is now represent represented as a point you know a vector has a magnitude and orientation depending upon the value of its components but in a feature space which is like a vector space which could be 1 d to the minimum but generally very high dimensions of course we can visualize up to three dimension only but if you have a large dimensional feature vector it is basically indicating a point in that feature space okay the feature space dimensionality is dictated by the feature space dimensionality is dictated by the dimension of the feature vector of course you can have variations like one-dimensional case where these are examples of points representing sample in one-dimensional feature space what does this color indicate it is possible that the red colors could be indicating points of features extracted from one category of the object and the blue color could be representing another category okay to your right but we have just extracted one feature and hence the feature space is one-dimension and the points are marked in one day of course you can have features in two-dimension also where you can have points in two-dimensional space which we will see in the next slide but in general you can have an n-dimensional feature space next what we are going to see is an animation which will take you into this feature space since our visualization is restricted to 3D so you will see this animation which will take it in 3D space and you will see a set of feature vectors sometimes they form a cluster sometimes they form two distinct clusters or a weird looking cluster we will see what is this cluster and what is this classification see the difference between the two in this animation and come back and continuing the discussion between of the difference between clustering and classification so let's look at the demonstration in the screen now this will give a visualization of a feature space in 3D you may actually feel that you are traversing in space in a shuttle what you have to realize is that each of these dots on the screen represents a point in a feature space a point in a feature space is also a feature vector okay now as we are passing by you will not notice any particular type of pattern or group or cluster or anything remarkable which would be happening here but as we move ahead in the screen you will start to notice a distinct set of points which are now appearing as a dense set of points a compact cluster a dense cluster the distance between a pair of points which are marked in yellow are much less in general compared to any two points which are in the background marked in white so you see a cluster being formed here which is marked in yellow this set of points form a dense cluster the distance average distance you can say if you are able to measure it between any pair of points is much less than if you take any two points which are marked in white and they are in the background let us say this distance is much larger than any pair of distances which you see here the question comes is how do you estimate that this is a dense cluster cluster you must have some measure by which say the cluster density can be estimated from this feature space remember we are observing this space in three dimension but in general a vector can be of very large dimension and the same thing may occur in a large n dimensional space so how to measure the density if you move on in the space you will find after sometime two spheres two spherical volumes or spheres marked in two different colors around the dense set of set of cluster points marked in yellow if you pick up any one of this any one of these pair of spheres you will find what we have done is the points in yellow which are within this sphere are actually marked in red you can see here they are marked in red although they are overlapping remember this is a three dimensional visualization so points within the blue sphere are marked in red points yellow points within the red sphere are marked in green I have shown two spheres just to indicate that if we make a count or do an estimate of the number of points inside the sphere it will give you an indication of density the definition of density from the basic definition of density we know that it is mass divided by the volume so assuming that the volume of both these spheres are identical if the number of points within a particular sphere is large then we estimate this the density at this point we much larger than the density at some other place but in general the density estimated around any region within the cluster or any space within the cluster will be much larger than if I would have put this sphere here in the outer space where you do not have a cluster but a few spot set of points it is possible that this cluster could be generated by a set of feature vectors obtained for a certain class or category or type of object what are these points they may not belong to a particular category or class of object so this is how you measure density of course you can replace the sphere by a unit cube and an estimate of the number of points which actually synonymously represents the mass at that region divided by the volume unit volume you this you can visualize this to be unit sphere or a unit cube will actually give you an idea of density so cluster density is one common measure used to detect the presence of a cluster I repeat again the density will be having reasonably large value when you measure it somewhere in this region of the cluster if you measure it somewhere else you will have a much lesser value of cluster density indicating the absence of a cluster so that is what we saw as a cluster do all clusters appear same well may not let us move a little bit forward in space on the left hand side of your screen now you see two different clusters one marked by red dots the other which we had earlier were in red dots they are two different clusters what is the significance of these two different colors which I have used to indicate the two clusters well it is possible that the feature vectors here indicated by this cluster could have been measured from images or signals or certain object categories resembling one type of object and this could have been obtained from another class of object you can have two classes class a and b class one and two and this picture all resembles two different clusters in which I can put these two different spheres find the count of the number of green points within this sphere out of the red and the green points out of yellow within the red and estimate density of both these clusters indicating that I have two distinct clusters here there is no cluster elsewhere because when you measure density at the background somewhere here you will not detect the presence of any cluster because the cluster density value will be much less whatever you estimate will be much less in the background this figure also illustrates the problem of classification vis-a-vis clustering we have two different clusters all right but the clusters are separated in space there is a distance between them and the distance between the clusters indicates how far are these two different classes or categories or object types for which we want to do pattern classification task remember the difference you have to detect clusters is one job of clustering after you have detected clusters let us say you have detected more than one cluster it is possible that the two different clusters could be belonging to two different object types and then you can actually find the distance between the two clusters by one of the simple methods is we can compute the average or the centroid of the set of clusters for the red points or the red markers for the yellow markers you can also find the mean or what is called as the class mean average or centroid of this cluster and look at the distance this distance is called the separability between clusters i repeat again this distance will give an indication of the separability between clusters the larger the classes or categories are separated in space the more will be the distance the less the separation we will see an illustration of that less the separation the distance will start to collapse or the distance will start to come down if you move ahead now what you see here is an interesting phenomena of i have removed the color of the dots i have removed the color of the dots if you remove the color of the dots that means you do not know which point is belonging to which class you can still it up these two clusters in space very clear because in fact now i have removed the background in fact you can detect these two clusters using cluster density estimation techniques one of them which we have just discussed is put a sphere and measure the number of points inside a sphere you will get some idea of the cluster density you can also measure the distance between these two clusters if you are able to detect them but the question comes is if the class labels as they are called which were given by the color markers remember this cluster was marked in red this color was marked this cluster was marked by yellow markers what we only have coloring is the points which are inside the sphere that is the only indicative which we have we have lost the color information then it becomes a different problem of classification sometimes it is called an unsupervised classification this is the case when you have to detect clusters and assume that each cluster belongs to a separate class or category or type of an object this could resemble a particular type of fruit this could represent a particular category of flowers let us say this still good point here which we now have at till at this point of time or till this point of time is this the clusters are still separated in space that means you have some distance based on which you can separately detect these clusters the problem can be more difficult as you can see now that I am bringing one cluster close to the other one and you can see the region here very clearly you can see the region clearly that the points are starting to overlap this is the region where classification becomes a very difficult problem if you visualize all these set of points together can we have that visualization yes look at it now this could actually represent a cluster by itself that cluster will not have a regular appearance the cluster may not have a regular appearance as we had for these two distinct clusters when they were separated out but this could actually represent a cluster and it is this situation where the cluster starts to overlap that the problem of both clustering or the problem of classification if we introduce back the colors let us see now how it looks like look at this problem this problem is not that difficult but we are entering a situation where the classifier given the fact or the class labels that these points belong to one category class a the points in yellow belong to another category class b the classifier will work reasonably well perhaps if you have a powerful classifier design but it will make errors here the region where that the cluster points start to overlap between classes so you look into this problem of classification busters clustering here which we are trying to discuss this is can be considered as a problem of classification if you remove the colors it can become a problem of clustering even here you can detect two clusters but I am not sure whether a very naïve classifier or naïve clustering mechanism can detect two clusters prominently it might detect two it might detect three it might detect four if the number of clusters are not known if the numbers of clusters are not known you might think that this is a cluster which has a very irregular shape that is possible we will give some examples later on of certain shapes which are geometrical in nature all right but they are not very smooth but of course the other problem which is not clustering is the problem of classification as you see here in this problem of classification I have reintroduced the class labels with the help of color markers and the problem of classification is to discriminate features or feature vectors belonging to one class vis-a-vis or with respect to features in the other class the difficulty is what in this case for classification that there is an overlap the distance between the classes is much less is much less compared to what we had earlier look at this problem this is a much better situation for both classification and clustering where what is called as a inter class distance between these two classes of feature points is much much larger much much larger compared to of course the scatter or the spread or the width of the individual clusters there is hardly there is in fact no overlap I can actually draw a line here and say to my left of this line I have all the red points think of an imaginary line here or a point along this line to the left I have all the red points with respect to the this line as you can see here which I am drawing on the screen to write I have all the yellow points so classification becomes easy if I bring them close see if you bring them close as you can see that the cluster distance now is very minimal there is a lot of overlap there is a problem of classification to produce accurate results so we have to see in two scenarios now where classification and clustering could be easy these are the situations where classification with class levels or if I remove the color information clustering also could be difficult problem I will leave it to you for visualization if these two clusters overlap quite a lot more then you may actually have just one visible cluster out of the two and even if the class information of the color levels under overlap is given the classification algorithms might struggle to give certain good results so we will now look at we will now look at two different methods we list a methods a few methods of clustering and classification we have so far understood the difference between the two what it takes or what does the clustering problem of clustering solve vis-a-vis the problem of classification we have seen it we have seen some animations some static pictures and also a table containing discriminating points between the two now we will just list a few methods of clustering and classification a few state-of-the-art algorithms will just name them mind you because now what this you are at a point where you can understand how you can write algorithms to solve this problem and we are going to do that to the rest of this course discuss many methods of clustering as well as classification but we will just name them now keep them in mind as we go to the rest of the course where these methods will be discussed in detail we go back to the slides you have a list of methods on your left hand side column which indicate which are names of methods used for solving the problem of clustering representative points split and merge linkage there are many methods of linkage we will discuss that later on som indicates self-organizing map a special class of neural networks which can also solve the problem of clustering although actually they do solve a problem of mapping model-based methods vector quantization we will discuss most of this to the rest of the lecture series of pattern recognition not today what are examples of classification well the classic best example is the best decision rule which will be discussed in detail then we have the linear discriminant analysis or the lda which is called the fissures criteria then we have methods based on kn nearest neighbor called the kn the feed forward neural network is the most commonly used architecture under the class of artificial neural networks or a n n i repeat it is called the feed forward neural network which uses a back propagation learning law support vector machines the most popular and the powerful one till date and random forests we will discuss most of these as we go on before we end up the talk we would also like to see the general categories of methods of clustering data usually most books will be categorizing them into two different classes one is called the hierarchical or linkage based the other is called the partitional methods even within the hierarchical you will have a divisive policy and an agglomerative method of clustering again I will repeat you have two methods called agglomerative and divisive methods of clustering under the hierarchical category of methods partitional methods some examples again two categories exclusive or probabilistic under exclusive you have the minimum spanning tree which uses the representation of graphs the k mean at the k may doids and under probabilistic you have the gmm called the Gaussian mixture model and the fuzzy c means fcm as it is called which is an extension of the k means or c means algorithm this categorization of methods of clustering is is a very generic one it may be that when you go to certain literature and read from books the clustering this categorization of the different methods of clustering may appear a little bit different than what I gave you but of course you still have methods which are based are partitional which are hierarchical and some of them are probabilistic and some of them are exclusive I have just confine myself to discussion of between these four different categories of lectures there are alternative view for algorithms of clustering also some would like to say an unsupervised method of learning is also clustering or an unsupervised classification is also called clustering examples are k means and k may write this is where most people will confused do you think unsupervised learning or classification is clustering well may not be clustering has no class levels unsupervised learning has no class levels but the purpose is different in the method of clustering you try to group data try to find structures within data in the case of unsupervised learning or classification you tend to form groups that is also true you may detect clusters that is also true but actually they are not same it is not wise to call a method of clustering as an unsupervised method of learning or classification in a very loose sense you can tell that but that may not be actually true other methods involve density estimation which we talked about when we are using spheres to indicate how to measure density in the feature space typical examples of density estimation couple of some of them could be parametric type that means you estimate certain parameters from the density Gaussian mixture of Gaussian as it is called GMM is the classic example you can take non Gaussian distributions also although they are not that popular the Dirichlet or the beta distributions then there are other methods called the Banchen Brown piecewise quadratic boundary nearest mean classifier or a maximum likelihood estimate these are just names of different methods of clustering based on which are all parametric density estimation there are non parametric density estimation also as given in this slide which are also often used methods based on histogram nearest neighborhood or neighborhood neighborhood kernel based methods graph theoretic methods iterative value seeking some of them we will discuss if not all of them in the remaining part of all lectures I would like to conclude the lecture today of clustering versus classification with the acknowledgement for my master students Mr. Pathik Srivastav for helping me in creating this animation and Mrs. Suranjana Samanta and Professor C.A. Murthy for helping me to prepare the slides thank you and get back to the remaining set of lectures to understand all different methods of clustering and classification and many other applications thank you