 In the last session, we were discussing the concept of supervised method of pattern classification which is based on the Fisher's linear discriminant analysis or LDA. It is sometimes called FLDA as well in the pattern recognition literature. And we have just introduced 2 different matrices for the one is the within class scatter matrix and there is the between class scatter matrix. We will have a look at that expressions one more time and then look ahead and see some important properties and expressions of LDA and we will wind up this class with a few examples which can be handworked out with you okay by you. So let us look back what we are saying is LDA is a method of supervised learning it is one of the classifiers which needs set of training samples for learning a set of parameters. In this case the parameters are SW and SB the within class scatter and the between class scatter respectively. The learning set is labeled unlike the PCA where we had unlabeled data and that was also called unsupervised learning. So it is a class specific method in the sense that it tries to shape the scatter in order to make it more reliable for classification. Remember in the case of PCA we were interested in trying to find out the directions in which the maximum scatter of the entire data exists that is what principal components analysis or PCA does. In case of LDA we want to maximize class separability okay that means in some sense the between class scatter if you take a 2 class problems we want to find a certain direction of the data within the data certain dimension in which the distance between the 2 class means or the 2 clusters of the 2 classes become large or become more and in the same direction the within class scatter becomes less okay. If we recollect the animation slides which we had long ago in an earlier class when we were trying to distinguish between classification vis-a-vis clustering we always said that it is better and easier for a classifier to perform better if the between class distance or the distance between the 2 cluster centers or the 2 clusters themselves is very large and the clusters are very compact. So if you can find such a dimension or set of dimensions what can be considered also as a subspace okay with respect to the original higher dimension of the data if we can find a subspace where the directions point out that we are going to have or is satisfied satisfies a constraint that the inter class distance is very large and the inter cluster distance is very small that is what is LDA trying to achieve okay. So that is what is the meaning of this sentence which you see now that the class specific method it tries to shape the scatter in order to make it more reliable or better for classification okay. So this is accomplished with the help of trying to find out a weight matrix W which maximizes the ratio of between class scatter HB and within class scatter SW where the terms will define them again for your ease of understanding that this is the between class scatter HB if you look at the expression here the N i is the number of samples for a particular class X i let us say the class label is X i, mu Y is the mean for a particular class X i and the mu is the overall mean of the data okay. So it is something if you look at P Y minus B mu it is basically that the individual cluster centers are normalized with respect to the mean okay of the entire data set and summation over N i will actually give the number of samples but a particular class a certain weight age capital C or small c here whatever you see is the number of overall number of classes okay. So you need to sum this over all classes that is the between class scatter let us look at the expression once again for SW which is the within class scatter matrix you need to sum it over all classes alright but you sum it over now this expression is similar to the PCA it is an outer product of the samples with respect to mu I that in case of PCA you had the overall data mean here which was mu the same mu which you see here on the left hand side would have occurred here in the case of PCA but in case of LDA you have the mean subtracted from the data which is the individual class means okay the mu i is the mean of the class X i which you have to take that and take the outer product you sum it over all the samples for the particular class in fact X k belongs to the samples at X i that means basically n i the summation will go over n i number of times and then over c number of classes okay so the total number of summation terms which you will be having is basically n multiplied by c SW and SB have the same dimension as the scatter matrix we had for PCA the dimension is the same but the matrices themselves are little bit different okay if you look at the expressions both are in some sense the outer product but one of them is computed with respect to the means only the other is computed with respect to the samples means subtracted with respect to the class specific means okay the PCA we subtracted the overall data mean I am repeating again here you are subtracting the class means okay so SW and SB is what you have okay and let us so what we are trying to do is find out an optimal W and we will just talk a little bit later on that what happens if SW is singular but assuming that the within class scatter matrix is non-singular you try to find out an optimal value of W which maximizes this expression okay so you try to find out a W which maximizes expression and in the process of doing so you get an optimal W which can be written in terms of a set of eigenvectors as given here instead of eigenvectors and they are the eigenvectors of this particular matrix which is SW inverse multiplied by SB in fact what you are doing here is trying to find out the m largest eigenvectors of this characteristic equation which is if you think of SW inverse multiplied by SB as an overall scatter matrix S then you are actually trying to find out the eigenvectors and eigenvalues lambda is the corresponding eigenvalues of this particular matrix okay and this is the reason why it is essential that the within class scatter matrix is non-singular because you need to obtain its inverse of that matrix then multiplied with SB and then find the corresponding eigenvalues and eigenvectors this is the basic approach for FLDA or LDA correct and do that you need to find the inverse of the matrix SW the question comes is SW always singular we will have a look at it very soon I will just give you a key point with respect to some properties of within class scatter matrix SW. So there are actually at the most C plus 1 minus nonzero eigenvalues in the lambda so if we look at what is called the eigen spectrum of this particular matrix the upper bound of m here is basically number of classes minus 1 so the restriction on the number of nonzero eigenvalues and the corresponding eigenvectors for the W will actually depend on the characteristics or properties of SW whether it is singular or not these are some of the main criteria which we need to follow SW is singular if the total number of samples n is less than the dimension d remember this is total number of samples not the so you can say that this is the average number of samples per class multiplied by the total number of classes okay that will give you the total number of samples n its rank is at most n minus c okay where c is the number of classes so you can see that the number of samples here is going to dictate has a very important say on the singularity or rank of SW typically if you have sufficient number of samples you do not need to worry about the singularity of SW if you have sufficient number of samples very very large okay but there are certain applications where there are a dearth of the number of samples both for a particular class and for all the data samples put together or all the classes put together a typical example is the case of face recognition problem in the case of face recognition the dimensionally typically may go to a few millions okay it is very very large okay in fact the resolution of size of the image n square if you take as not this n but if you take this 640 by 480 image 640 by multiplied by 480 will be the dimension of the problem whether you are doing PC or LDA now you will not have those many samples per individual or total number of faces available in a database if you take let us say an example of a very large database which may have even a few thousand individuals you may have around 10 let us say okay 10 to 100 at the most samples per particular class in fact there are certain situations where you have just a few samples may be just about half a dozen or 10 samples per particular for a particular class in this case the class is a particular subject so in such a case definitely what will happen is the number of samples n will be less than the dimension if you look at the slide this constraint will not be satisfied in all applications or situations of pattern recognition problems all problem this may not satisfy in certain situations if the number of samples is less than the dimension or the dimension is much larger than the number of samples this becomes singular okay and its rank is at the most n minus c we will move at with this assumption that the rank is n minus c for the time being we are looking into this problem however if the number of samples are very large and it exceeds the number of dimensions in certain cases which it is possible yes in such cases you do not need to worry about the singularity of SW but there are certain situations where you need to worry the number of samples are less typically those are the cases which are called the SSS problem which is called a small sample size problem and the number of samples being very large per class or for all the classes put together that again is less than number of dimensions in such a case you cannot invert SW okay it is not a full rank matrix it ranks restricted by the number of class and number of samples solution to the such problems if you have singularity in SW is the following project the samples to a low dimensional space and do that we use our method of PCA which we have studied in the last class to reduce the dimensions from the feature space from the original dimension D to a dimension n cross c and once that is possible to apply the standard Fisher linear discriminant criteria FLD to the reduced dimension c minus 1 okay. So in such a case when you are applying PCA to reduce the dimension the W optimal value of W can be visualized as that you have done a PCA earlier and then the corresponding W which you would have obtained by the FLD criteria and you the PCA criteria for getting the W for the PCA is given by this where ST is the origin scatter matrix remember we had an expression earlier which said that the scatter matrix ST is the sum of SW plus SB. So ST is SW plus SB and W FLD can be written as an expression you can see that this is the similar expression to whatever we got earlier except that you have the PCA done before the LDA. So that corresponding matrix comes and sits as a pre and a post multiplication with respect to SW. So if you do this method under the condition that n is less than the SW singular you want to reduce this dimensionality of the space go to a subspace where you can reduce the dimension corresponding to the expected rank of the matrix then you do a PCA first and then do an FLD to reduce the dimension and people typically take C minus 1 dimensions for the FLD. So let us look at some hand worked out example this is a very synthetic diagram drawn with respect to 2 data sample points and we had this similar diagram in the case of PCA where the PCA would have given a direction along the maximum scatter or variance of the data samples which have been along the direction orthogonal to this vector. So that along this line along the data samples as this arrow indicates that would have been what the PCA directs but the LDA will give you a direction in which you are expected to have the maximum scatter that means you know if you project the samples point you expect the red points to be on the left hand side the blue points on the right hand side. So you may not have separability in this case because the data samples are very near by alright but this is the direction in which you will have maximum separability or the condition of SW inverse multiplied by SB is maximized correspondingly whatever with the way W transpose. Let us take this hand worked out example it is easy to do this calculations in a sheet of paper but of course you can keep your calculators ready if you want. So let us take this example of data sample points in 2D we are taking an example in 2D because it is easy to visualize. So these are set of XY points on the top you have set of X coordinates on the bottom you have set of Y coordinates you have them for 2 different classes. So the class 1 this is the class level at the bottom so for the class 1 you have 7 points for the class 2 you also have 7 points and the data samples in the XY 2 dimensional space will look something like this where you will have the data samples to the left given by these blue points here and the data samples to the right as identified by class number 2 will be given by the set of green points. So let us try PCF first before trying LDA we will try LDA also by using the class information but let us shut off the class inform let us say I give you a scatter. So forget the color of this code color code for these two different classes let us take all of these as a set of points as set of black markers let us say forget the color and let us try PC that means we are just taking the data samples and ignoring the class level and we do that this is the overall data mean for all the samples which can be averaged out by taking the average of all X values which will get as this average of all the Y values is what you will get as 5 and this will help you to compute the overall covariance of the mean subtracted data from this that means what you need to do is subtract from each of the sample points the mean and then do an X transpose to actually obtain the covariance matrix which will be 2 cross 2 matrix as given here because it is a two-dimensional data so that is what is the size or dimension of your ST or covariance matrix or scatter matrix. The corresponding eigenvalues obtained after singular value decomposition of this matrix is basically this which basically shows that along the first dimension you have quite a bit of a scatter along the second dimension you have you know maybe much less than the first dimension what are the corresponding eigenvectors for the first dimension this is the dimension of the eigenvector we will draw that in this diagram very soon you will have it up and this is the second eigenvector what do you expect along the first eigenvector you should have expected to have a large class scatter because the eigenvalue is very large along the second dimension you will also have a scatter but it would not be that scattered or that stretch as with respect to the first eigenvalue okay so if you look at the diagrams here the eigenvector shown here is given by the same color along this particular direction so it basically says that along this direction you have the maximum scatter of the data which is quite obvious if you look into this the overall data seems to form an electrical pattern in terms of a scatter and this seems to be the major axis of that particular scatter what is the other dimension which is given here this is given by also the blue color corresponding the same color code I have used for displaying the values of the components of the eigenvector as well as the vector shown in 2D so you can see that along this dimension you have very less scatter of the data samples which is probably almost one third or even less than the scatter which you have along the so what PCA is giving you are 2 different directions okay as we have discussed in the earlier class along the first direction you have a very large scatter along the second direction dimension given by the PCA you also have a scatter but it would not be as large as the first dimension how much of the scatter exists in the second or even third if it exists in this case of course we have just discussed a hand worked out example of two dimension but if you take an n dimensional problem you will get a scatter matrix of dimension n corresponding n eigenvalues and n set of eigenvectors so along the first eigenvector second eigenvector third and vector and so on you will have this diminishing scatter this scatter will be going less how much is it tapers down will actually depend on what is called the eigen spectrum or the distribution of the eigenvalues along the diagonal of that diagonal matrix okay so in this case is a two dimensional problem if you look here that the scatter along the first is much larger than the second which is expected and the ratio will probably tell us how much more is a scatter along the first dimension here or the first principle component that is the way one should say precisely this is the first principle component is the second principle component you could have a third if the dimension was 3 and the corresponding eigenvectors are given here and the corresponding eigenvalues are given here this is what PCA will do but as you can see here that if you project the data samples along the first eigen dimension you will not have separability in fact is the second dimension of the PCA the second principle eigenvector which will give you separability so PCA should not be used for classification in general we talked about this earlier it is typically used for dimension reduction for data representation okay are trying to find out a subspace where you can have maximum scatter up to that point it is okay as for example you have just seen in today's class that PCA supersedes LDA comes before LDA because we want to reduce the dimension in certain cases when the SW the within class scatter matrix is singular okay so PCA can come before the LDA to reduce the dimension that is one of the applications if you want to actually obtain class separability and if of course class labels are available you better try an LDA rather than a PCA so let us use this example in the next slide and move ahead and try to use this class information which is now labeled with two different colors to show that the data on the left side belongs to class 1 the data on the right side belongs to class 2 this is the distribution which we had for class 1 and class 2 in the previous slide as well where we use the samples for PCA without the class information now using LDA now using class information we will perform LDA on this dimension on this data look at this W okay so we had given the expression earlier but I am just we are giving you the values so you can use the entire data set to class problem two dimension to compute within class scatter 2 cross 2 within class sorry between class scatter matrix as given here the inverse of SW in this case it is non-singular okay because you have enough number of data samples with respect to dimension of the problem so it would not be singular here so inverse of this matrix I leave it as an exercise for you to do that it is simple 2 cross 2 matrix multiplied with SB okay so if you look at the inverse of SW which is non-singular matrix in this case as the number of data samples is large compared to the dimension this is what you get so what is the corresponding eigen decomposition of this okay eigen values of SW inverse SW is given by this okay this what does it basically tell you that along the first eigen vector you will have a large scatter but along the second eigen vector given by LDA you will the scatter is 0 the separability is 0 of the data and the corresponding eigen vectors are given here let us draw them on the data so this is the first corresponding eigen vector given by LDA remember PCA gave us direction which was along the maximum direction of the scatter LDA is given along the direction of maximum separability this is the direction as given in red the corresponding color has been used also to show you the direction what about the I have not shown you the other direction but let me tell you this direction will be orthogonal to this so it will be along this direction in which which is almost similar to the direction given by the first principal component of PCA let us go back to have a look if you look at the PCA eigen vector here and we invert the sign it is 0.8 and 0.6 which will give you the direction towards the upper right and it is same as the second eigen vector direction almost similar almost similar on the it is not the same it is not identical but along the similar direction you have a scatter which is 0 so I am not drawing the second eigen vector the main reason being that you are not expected to have any separability or the criteria is null so in this particular case we will take only the first dimension and project the data to have the maximum separability okay let us look at the effect of inter cluster distance on the eigen vectors and the eigen values as an example. Now let us assume that these are on the left hand side you have the within class scatter and between class scatter matrix of the data which you have just discussed in the previous example okay these are the same values okay you can check your notes that these will be the same values obtained from the data as done in the previous study if you go back you can see here this is what SW and SB is for this particular data so we have kept that on the left hand side on the right hand side I give you the same SW but a different SB okay and I am telling you the mechanism by which I obtained this what I did was or what you can do is you need to actually separate the two data out okay the data sample points which you have seen in the previous slide you need to bring in more separability I will show that with the diagram what I mean by separability separate the one cluster with respect to the other so if you do that basically give it a shift along the direction of not the maximum scatter by the maximum direction of separability okay so that can be done and I will show you what I mean by that when you do this keeping the within class separability same within class scatter same you can see that this is change or increase in the corresponding values of the between class scatter the between class scatter values have changed but the within class scatter values have remain unchanged what this basically means I have not change the distribution of the individual data samples but taken the two class means and separated them apart okay I repeat again kept the intra class that means within class scatter for each particular set of class samples I have not change this scatter of the data points but I have change the class means let us say one of the class means if you change to bring in more separability this is the what is the result which is going to happen that means we will have a larger between class scatter matrix and the same SW we will go ahead and do again the compositions for both the net result is the following look at the Eigen spectrum of the original data and compare this with what you have now okay so I have done judiciously such that I do not make the second component of the Eigen value non zero still keep it zero but there is an increase in the lambda one which is basically the corresponding first Eigen value along the first principle component obtained by LDA let us look at the corresponding Eigen vectors well there is not much of a change okay the change is what you will see in the second or third decimal places of the corresponding Eigen vectors so their directions more or less and the magnitudes have remain the same let us look at the data okay this on the left hand side what you see is at the data points which were given to you earlier in the previous slide as the previous example and that is the corresponding component given by the first Eigen vector we will never consider the second Eigen vector because the corresponding Eigen value zero so this is the first Eigen vector this is the same case here minor change in the value with respect to the Eigen vector here so it is almost the same but look at the separability this is the separately which I separability which I brought into the data to actually have a large value for the both the diagonal terms for the between class catamatics compared to the data which we had earlier and the net result is a larger again we can see that the separability of the data between classes is getting reflected in the corresponding Eigen value so this is a lesson so in some sense you have maximum separability along the first dimension the less you go to the second and so on and after some point of time you may not have any separability on certain dimensions which are you know beyond C minus 1 the number of classes equal to C here is equal to 2 the number of classes equal to 2 so C minus 1 equals 1 so you will have only one dimension in which you will have separability and you will not have after LDA performing LDA you will not have any separability after C minus 1 Eigen vectors that why lambda 1 C minus 1 equal to 1 lambda 1 is non-zero and lambda 2 is equal to 0 but if you bring in more separability of the data it gets reflected in the corresponding Eigen values the Eigen values will shoot up if the class separability is more or the inter-class within class scatter also gets reduced this example shows that between class if the spread or the distance is more the Eigen values are going higher the same will happen if the within class scattered matrix value goes down that means the data samples are much close to the mean themselves for a particular class that also can happen and that also can actually change the value of lambda 1 okay so this is an example which shows that if I take the data points and project it along this direction well there is a little bit of a scale problem here which the direction is not shown appropriately but anyway it is more or less accurate here this is generated by a small program so if you do this you can see that the separability is here that means all these blue points are projected to the left hand side so what you are seeing as the x axis here is this axis along the direction of the principal Eigen vector given by LDA so all these blue points will project on the left hand side as is here all these green points are projected here and there is some degree of separability between these two data samples okay the cluster means would be somewhere here look at this particular data and if I project now this on to the corresponding Eigen vectors you see the separability now you can see the separability in 2D the same gets reflected in the corresponding principal Eigen vector first principal Eigen vector given by LDA so if you project these blue points they all form a very strong cluster here the corresponding green points form another cluster here you can when you compare the separability on the right hand side with respect to this you can see the not only the separability is larger but the within class scatter you look at the scatter of the blue samples here compare them with the scattered here look at the scatter of the green points look at the scattered here so both this example show that the between class scatter has increased and the within class scatter has come down the scatter within a particular class remember of course one must be careful when comparing these two plots remember one thing here that they are not to the same scale now this is of range minus 3 to plus 3 so it is basically in a range of 6 pixels or 6 units in dimension here the dimension is 10 so in a scale of 10 the separability here if you say will be about 5 units in terms of length the huge separation and this separation is about 2 units just more than 1 so the separation is also increased and the within class scatter separability has also come down these are the 2 lessons and that gets reflected where in the Eigen values not in the corresponding Eigen vectors to wind up let us take an example where I increase the number of classes to 3 so what I have done the distribution of the points is same you have set of points in blue corresponding to class 1 set of points in green corresponding to class 2 and 4 points in red corresponding to class 3 will quickly go through the SW and SB since the number of classes is 3 you can expect that you will have separability in 2 dimensions C minus 1 will equal to 2 when you perform Eigen decomposition on SW inverse multiplied by SB you can see that you have both the Eigen values which are non-zero now and the corresponding Eigen vectors are given here okay so you have some small amount of separability in the other dimension which is the second dimension but look at the first dimension here you have a high degree of separability which is reflected in the corresponding Eigen value this is the first degree of separation for the class so you can see this is the class 1 this is class 2 and class 3 so of course this is the best possible dimension in which you can project the samples to have a fair degree of separability you may not have a separability along the other dimension so that is why you have a less value of the discriminant Eigen value when you project these samples here in the first Eigen vector this is what you have okay that means again I have kept the inclination of this direction same as this that means to take this blue point project it here take the green point project from this take the red point project it here this is what you have okay if you tilt it and show it here this is the separability you have a good amount of separability with respect to the blue class compared to the green and the red points there is some degree of overlap here between the red and green points which you cannot probably overcome so the red points here are here the blue points projected here original point data points here in two-dimension space they all project here so there is a fair degree of separability in the first Eigen vector obtained by LDA the same may not be true in the second Eigen vector you can see a huge degree of overlap this is along the second Eigen vector as given by the green arrow along the second Eigen vector when you overlap you can see a scatter which is overlapping in all three different classes so this is another example is possible to hand work out this using a calculator where you can find that the Eigen spectrum or the set of Eigen values obtained by SVD is going to give you the degree of separability between classes okay so in such cases one need to do well there are other methods of supervised learning which is called independent component analysis one can even try that to separate the data to wind up this class I have put forward a list of some may not be exhaustive but some advances or the technology over the last decade or some advances in the field of pattern recognition which has taken place over the last decade and I am just going to name them may not discuss at all any one of them in detail as talked about by Professor Murthy and myself that this is a course for beginners in the field of pattern recognition and we hope that you get encouraged in this field of study from these lectures and read books and read much more advanced topics which have come out in several other books as well as advanced literature getting published in rich conferences and journals so we look at them but let us look at the list of techniques which we may not have covered and they are some recent advances people adopts of computing methods based on neuro fuzzy techniques we talked a little bit of perceptron and neural networks you combine that with fuzzy reasoning to actually get a neuro fuzzy architecture for class discriminability rich in fact these are methods which are used often to discriminate between classes which are overlapping some of them do not function for Gaussian distribution and so on and so forth. Multi-classifier ensemble or combination which involves both decision and feature fusion this actually talks about trying to use all set of different classifiers together to form a unified decision the classifiers themselves will be different in terms of architecture or they could be trained with different number of samples there are lots of theories on this people also work on reinforcement and probabilistic learning there are works which try to handle small data size problem or what is called as a small sample size problem the triple S in the field of neural networks and pattern recognition people work on generalization capabilities of neural networks and pattern recognition algorithms people also work on evolutionary computations which is very near is a sub-paradigm under soft computing there are methods based on decision trees multi-objective clustering and the most of the clustering algorithms algorithms which we have discussed tries to minimize one criteria if you let us say take K means it tries to reduce some sort of an Euclidean distance with respect to the mean but there are better clustering algorithms which actually tries to minimize one criteria or try to minimize one criteria with by keeping some other constraint in mind so those are some examples of multi-objective clustering very recently people also working on manifold based learning and optimization where people take ideas from the areas of differential geometry and topology people work on genetic algorithms pervasive computing neural dynamics yes support vector machines and kernel methods modern machine learning methods have contributed a lot in the field of pattern recognition in fact there are certain areas in which machine learning and pattern recognition overlap quite a lot very hard to discriminate whether you are exclusively talking about machine learning as by itself or pattern recognition some of the terms and methods are semi-supervised learning transfer learning deep learning dominant application etc these situations have become very important because it is possible that you may not have lot number of samples in a certain data set but there is an auxiliary or an ancillary data set which has huge number of data samples so you could train your classifier with such samples and try to see if it performs well in another domain well well it does not you cannot train directly on other data set and train on another data set and test it another one so there are methods in which you try to transfer the information from one domain to the other there are methods on other methods of transfer learning and semi-supervised in which actually you have the human intervention in between which tells you that some of the classifier decisions made during testing are correct or not and you try to readjust the weights of a particular classifier carrying on there are methods on random forest independent component analysis well I will put a one minute sentence here remember PCA LDN then you have ICA people talk about these three things almost together PCA works without class levels it gives you this scatter along the maximum direction okay LDA supervised learning which tries to give you the scatter along the direction which you have maximum separability between classes the difference is PCA takes the entire data into account and gives you maximum scatter whereas LDA gives you maximum separability we know that criteria SW inverse multiplied by HB that is the criteria which tries to which it tries to maximum and maximize and it you need class levels for that and hence LDA is supervised PCA is unsupervised ICA relaxes one constraint for LDA it needs class levels it is a supervised method of learning but it relaxes the constraint that the second eigenvector is normal to the first third is normal to the first and second and so on if you rely because there are certain data sets in which after you obtain the first principle eigenvector let us say in which you have the maximum separability it is not necessarily true always in all data sets that the second separability will be orthogonal to the first that may not be the case so in such cases you need to find out a mechanism where you get the second component of the second maximum degree of separation which should be in a certain direction may not be necessarily orthogonal to the first one but basically means if you are looking at some ith principle component in LDA that may not be orthogonal to the rest previous I minus one principle eigenvectors that is what ICA tries to do to overcome the restriction on LDA people have worked on pulse spike and probabilistic neural networks and there are methods of nonlinear and convex optimization related to both manifold based learning related to model methods of ML which people are trying out to solve complex problems in the field of pattern recognition graph based kernel and embedding are also some of the methods in structural pattern recognition and syntactic pattern recognition which we have not covered in this course thank you very much.