 Today I will be talking about fuzzy C means algorithm, I will be talking about fuzzy C means algorithm, before that let me just say little bit about soft computing techniques where fuzzy sets is one of the components of soft computing techniques. Soft computing techniques were first introduced by someone named Lot Fijade in 1965, actually he introduced fuzzy sets in his famous paper which was published in information sciences. Fuzzy sets is a variation from the ordinary sets, note that in ordinary sets if you have a set A and if you have a point x we know that either x belongs to A or x does not belong to A, only one of them is true, another way of saying it is you have an indicator function of the set A where this Iax is equal to 0 if x does not belong to A and 1 if x belongs to A, I is said to be indicator function of the set indicator function, Jade defined fuzzy sets as you have a set A with the membership function value membership function which is mu for a fuzzy set A, it takes values not only 0 and 1 but it can also take values in between 0 and 1 that is this is known as membership function membership function and mu A x takes values in the interval 0 to 1 not only 0 or 1 it may take a value in between, there are many examples one example is suppose you want to look at the set of all let us say tall persons and then if someone height is let us just say 5 feet 5 inches and would you just call him tall probably all the persons whose height is more than 5 feet 7 inches you may call them tall and probably all the persons whose height is less than let us just say 5 feet 2 inches you may call them not all but from 5 feet 2 inches to 5 feet 7 inches you have you may not have a clear cut idea of what exactly is tall probably let us say this is 155 centimeters and then say it is 64 let us just say 166 centimeters just I have taken some values so and here I am writing membership function say this is 0 then say this is 1 if the height is 5155 centimeters probably you may not want to call the person as tall if it is 166 centimeters surely you would like to call the person as tall and anything greater than 166 you would like to call the person as tall anything less than or equal to 155 it is surely not all that is this one and between 155 to 166 probably you have like this and this is called membership function the membership for the attribute tall is will be increasing and it will go to one this many times in reality we use adjectives without clear cut mathematical formulations without clear cut mathematical formulation exact and precise mathematical formulation is not there and by using those adjectives when I speak to you or when you speak to me when you use those adjectives or when I use those adjectives we understand each other when we are able to understand each other and we would like to have we would like to have our computer also to understand these words we would like to have the computer also to understand these words that is the basis for soft computing I mean that is the basis for fuzzy set theoretic that is the basis for fuzzy sets it is also the basis for what is known as fuzzy logic what is known as fuzzy logic in the usual binary valued logic that is 0 or 1 a statement is either true or it is not true but in fuzzy logic that is a statement is true with some membership value let us just say 0.5 0.6 0.7 and it is not true with some membership value point again with some membership like that you can have I mean the membership functions in many many problems which is actually the case in reality in reality we do not always make mathematically precise statements in fact many times most of the times we do not make mathematically precise statements even then we understand each other then we would like to have our computer also to understand the language which we are speaking then the come then the analysis should be done in such a way that this understanding is possible so one of the ways that the suggested is to enter to use these membership values in the analysis so there are very many developments after fuzzy sets neural networks is considered to be a part of soft computing multi-layer perceptrons and radial basis function networks you have principle component analysis networks these are considered to be part of soft computing genetic algorithms are considered to be part of soft computing rough sets are considered to be part of soft computing soft computing actually means that we can have input imprecise we can have ambiguous input and sometimes even incorrect see when we discuss many things sometimes we make incorrect statements but even then we understand each other always whenever I mean the basic significance of all these things is we would like to make our computer to understand and to behave in such a way as a human being since we sometimes make incorrect statements still we understand each other then how do you make the computer also understand these sort of things basically with this brief introduction to soft computing like fuzzy sets neural network genetic algorithms and rough sets okay I am not going to discuss all these things in this class I will I have only talked about fuzzy sets because I will be using them in the fuzzy C means algorithm fuzzy sets means you basically have some membership values you basically have some membership values and the membership values are used in several several applications and that is the reason why I introduced fuzzy sets a set a is he said to be a fuzzy set if the membership values they lie between 0 and 1 membership values they lie between 0 and 1 and there is at least one point for which the membership value is strictly in between 0 and 1 if all the membership values are either 0s or 1s then it is going to be called as a crisp set if all the membership values are either 0s or 1s and nothing in between 0 and 1 then the set is called as a crisp set fuzzy set means there exist at least one membership value which lies strictly in between 0 and 1 it is neither 0 nor 1 at least one x for which Mu x is strictly in between 0 and 1 so that is fuzzy sets now using this fuzzy sets we can also discuss what is known as fuzzy partition fuzzy C partition fuzzy C partition first let me tell you the meaning of ordinary partition that we all know but I will repeat it again then we will go for fuzzy C partition now the usual C partition C is greater than or equal to 2 and C is an integer usual C partition of a set S is let us just say x1 to xn let us say say say S is x1 to xn is what it is like this usual C partition of a set S represented as let me just call it P is for partition a1 a2 a C you need to give partition of S into C subsets partition of S into C subsets C is the number of subsets okay is what represented as this and the definition is what is the definition definition is first each AI is not is equal to 5 right and AI intersection AJ equal to 5 right and union partition of S into C subsets that is a1 a2 a C if I represent the C subsets as a1 a2 a C then the definition is that each AI should be not is equal to 5 and AI intersection AJ is 5 and union AI 1 to C is equal to S so this is a usual C partition now what is fuzzy C partition now fuzzy C partition partition of S basically you need to give membership values basically you need to give membership values fuzzy C partition of S represented by represented by U is U S S is the set okay where U is n by C matrix where U is an n by C matrix U is equal to so I throw this column element is written as Uij okay where Uij denotes membership value membership value of ith point the jth set jth fuzzy set jth fuzzy set naturally the number of points is n so I lies between 1 to n and C1 less than or equal to j less than or equal to C fuzzy C partition of S represented by U S where U is an n by C matrix Uij denotes membership value of ith point to the jth okay satisfying the following properties satisfying the properties stated I do not I do not want to write below because I need to write there so satisfying the properties stated let me just try it stated here just one minute I will erase this portion shall I start the first property is naturally 0 less than or equal to Uij less than or equal to 1 for all ij membership values to summation Uij j is equal to 1 to C this means you have taken the ith point you have fixed the ith point it is membership for the first set is UI1 membership for the second set is UI2 membership for the jth set is UIC membership for the cth set is UIC so this submission it is equal to 1 okay its summation is equal to 1 in the usual partition if a point x belongs to one of the sets it is not going to belong to any other set so for that point if you sum up all the membership values one place you will get 1 and at all other places you are going to get 0 so summation is 1 here the summation is 1 but it is not necessarily true that exactly at one place you here you have 1 and at other places you have 0 you may get some fractional values here 3 3 is summation i is equal to 1 to n Uij that is you are fixing the set now and you are looking at the sum of all the membership values for a particular set it should be strictly greater than 0 it should be strictly less than n the corresponding thing here in this properties is AI is not is equal to 5 for all I that means every set has at least one point if a set has no point then all the membership values will be 0 then the summation also will be 0 right if a set has no points all the membership values will be 0 then the summation is also 0 so this is actually an extension of this property this is actually an extension of this property so if Uij satisfies all these three properties then you say that that is a fuzzy C partition of the data set that is a fuzzy C partition of the data set now when we have usual partition what did we do we have defined an objective function if you remember for the C means or K means algorithm when we did it we defined an objective function what was the objective function the objective function is you have this partition and we defined Vi mean of AI I is equal to 1 to C we have got a partition for this partition for every I mean of AI okay is defined and then what did we do we looked at this double summation of X- suppose we have this set S and this set S is a subset of m dimensional space S is a subset of m dimensional space Euclidean space then we can talk about mean of AI some of all the points in AI divided by the number of points in AI you are going to get mean of AI then this is objective function what is this objective function for each point X we just look at which cluster it belongs to it belongs to the cluster say AI then for that point X you look at the mean of that cluster the mean is Vi find out the distance between X and this Vi and then take the square of the distance that you do it for every X and AI and I is equal to 1 to C so this is basically the within cluster distance that partition that which provides the minimum within cluster distance is taken as the best partition that partition that provides the minimum within cluster distance that is taken as the best partition so basically this is to be minimized over all these partitions this is to be minimized over all the partitions this is the minimum within cluster distance criterion and the usual C means or K means algorithm I am using both these words C means and K means the reason is that when this algorithm was proposed sometime in 60s it was proposed as K means algorithm and that is what people have been using it and fuzzy set theory people they devised this algorithm that was in late 70s and early 80s late 70s and early 80s by best debt and he used C for the number of clusters he used to see for the number of clusters so number of clusters C K fine I mean you give some name for the number of clusters and then you devise the algorithm so CRK there is a number of clusters and here C is the number of clusters so it is C means algorithm now it is the minimum within cluster distance criterion is this and since we are not in a position to get the minimum within cluster distance since we need to do the search over the number of such partitions is too many so they are the algorithm K means the C means algorithm was devised the C means algorithm was devised here also in the fuzzy set theory setup also we have defined a partition and there is an objective function what is the objective function the objective function here is the following we take a parameter let R be greater than or equal to get strictly greater than 1 let R be strictly greater than 1 let V I that is mean for the ith cluster is summation X I into sorry let me write it as J J is for the clusters so V J is summation X I X I multiplied by U I J whole to the power of R we are multiplying X I by U I J whole to the power of R if U I Js are either 0s or 1s the place when it has 1 the corresponding excise will be summed divided by the number of such points the place where U I Js are 1 if you just sum up all those things that will give you the number of points in that cluster that is U I J to the power of R summation that will give you the number of points and this is summation of excise divided by the number of point so that is mean summation of excise divided by the number of points so that is mean this is the mean now objective function the objective function is dependent upon this value R objective function if you start with if you have a fuzzy C partition U of U of a set S and R is with respect to the parameter R what is this one this is actually going to be U I J to the power R norm of actually X I – V J prime a and let me just write it like this U S a where a is a positive definite matrix a is a positive definite matrix note that it should have m rows and m columns note that it should have m rows and m columns if a is identity matrix and U I Js are either zeros or ones then actually what you are going to get is this let me repeat it if a is identity matrix and U I Js are either zeros or ones and nothing in between then this summation is same as what I wrote here so this is we are looking at it in a much generalized setup first this U it is not either zeros or ones something in between and we introduced a positive definite matrix and there is an R so this is to be minimized for a given a and S given a S and R you need to get a U which minimizes this that is the problem formulation I will write it here given S the data set a the positive definite matrix and R one needs to find the optimal U which minimizes J R U given S is a data set a the positive definite matrix and R the exponent term R is the exponent term one needs to find the optimal U which minimizes J R U S U is the membership value matrix S is the data set and a is the positive definite matrix so till now we have been discussing the problem formulation in the fuzzy seems algorithm we have defined an objective function J we have defined an objective function J this function depends for every fuzzy see partition U of the set S and for a given R greater than one and for a given positive definite matrix a the objective function value is given by this expression we are supposed to find the best U given S a and R one needs to find the optimal U or the best U best from the point of view of that U which minimizes J R U S a now how does one do it before I go into the algorithm part I would like to mention a few things this algorithm was first proposed by James Bestek a very famous figure and a proof was also given to the algorithm that it converges and then the proof was modified and then a modified proof was given but then ultimately after a I should say a few iterations ultimately the correct proof was found the algorithm was same throughout in all these things the proof needs knowledge quite a bit of knowledge of mathematics including topology and other such fields I will not go into the proof of the algorithm I will basically tell you the steps of the algorithm so this is FCM algorithm F for fuzzy C means fuzzy C means algorithm what are the steps first one is we are given the set S which has smaller number of points and it is a subset of RMM dimensional space we are given the positive definite matrix A and then exponent R I am saying that we are given okay usually people choose the value of R some R greater than 1 usually people choose the value of R something that is greater than 1 and people choose it as something which is very close to 1 from 1.01 some people may choose to also but this value is generally chosen by users and of course the number of clusters C the C is same as the C we will start with a fuzzy C partition U of S so once you is there you can always calculate the means VJs what is the expression for mean the expression for mean is this this is the expression for mean so you can always calculate this you can always calculate this expression so you can always calculate VJs And once you calculate vj's you can calculate uij's how the expression for uij is this summation k is equal to 1 to c of dij square by dik square k is equal to 1 this k is 1 to c this thing to the power 1 by r minus 1 and hold to the power of minus 1 here r is greater than 1 that is used then you can write r minus 1 what are this dij square and dik square actual k is equal to 1 to c and dij square is this portion dij square is this portion that is you understand why it is written as d d is the distance distance of the ith point to the jth mean that is dij square this is dij square in the numerator and the denominator dik square k is equal to 1 to c this thing hold to the power 1 by r minus 1 this hold to the power of minus 1 okay so you compute given you you are computing v's from their u and from you again v then again you and again v the fifth step is go to 3 the convergence criterion is not satisfied go to 3 if the convergence criterion is not satisfied what is the convergence criterion you have now a previous fuzzy C partition and present fuzzy C partition to fuzzy C partitions you take the difference that is again a matrix okay and then you take the difference take the modulus values if each of them is say less than say some epsilon then you stop it otherwise you do this you can so you can just go on and on and on doing till that is satisfied you can have a difference stopping criterion also here I use the stopping criterion on use you can do the same thing on v's also if two consecutive set of vectors means the previous mean and the present mean of the same cluster if the distance between them is less than epsilon and if that happens for all the C clusters then also you can stop it you do that or you do this does not matter you just go on and on doing this so convergence for the convergence criterion you need to have a small value epsilon and you need to calculate the difference between either the two fuzzy C partitions or the two C mean vectors right either you have to calculate the difference between the two fuzzy C partitions consecutive two fuzzy C partitions are consecutive two fuzzy C means if the each difference is a less than epsilon then you stop it otherwise you go to that calculate the next just go on doing this and the convergence theorem says that this process really converges that is if you go on and on doing this thing then you will surely I mean ultimately the UAJs the previous UAJ the present UAJ they will become same or the means also the previous mean in the present mean they will become same ultimately the differences they will go towards 0 that is basically the convergence theorem now the next point is what sort of clusters are we going to get right so we have got an algorithm and we can use it and it is converges so you get a you right what you may do is that since it is fuzzy clustering and you are ultimately interested in getting the ordinary clustering you can proceed in a few ways one way is that all the membership values which are greater than 0.5 okay or for each point what you can do is that find the maximal membership value to whichever cluster it goes give their one okay give that thing one and the rest as 0 this is a way which is followed by many okay in that way you will get a crisp clustering that is one thing that you can do it you can also do one thing you can keep the fuzziness as it is many times you need overlapping clusters what this fuzzy seems algorithm does is ultimately it provides you overlapping clusters okay because the same point has membership to more than one class so what you are ultimately going to get is overlapping clusters now when you get overlapping clusters in many situations overlapping clusters are important why they are important you do not want to say that it is always this cluster or always that cluster maybe you do really have I mean that point may be actually put in more than one cluster why do you want to force it to go to only one cluster maybe keeping the information that it can go to both the clusters may help you these sort of situations they arise many times in I will tell you one example where this sort of situation arises many times these things they occur in the satellite image processing when you have taken when you take the image of a place from the satellite say the resolution is say let us just say even if it is one meter resolution suppose you take the bridge on the Cauvery river okay the photograph is taken from above so you see the bridge and under that there is water now that those pixels they can go to either water or they can go to bridge they can go to concrete structures or they can go to water you may want to keep these two different two clusters information intact you may not force it to go to always to either water or to I mean either the water or to the other one other cluster the reason is that if someone is interested in crossing the river by boat then he would want the information that this is water if you are interested in crossing the river by railway line and by using railway line then you need the bridge so why do you force it to go to either always this or to that you may want to keep this overlapping information intact so that wherever you need that part then you may want to keep it there are you understanding me you need not force it always to go to one of the clusters you need not force it to go to always one of the clusters keep it as it is so that only when ultimately you are forced to do a classification are forced to do clustering depending on the surrounding depending on some information then okay in this depending under these circumstances you would like to use this water so we use the water these sort of things are extremely necessary in very many real life problems in very many real life problems for the same thing you may have more than one alternative you may want to keep both the alternatives intact you may not force yourself to go to only one of them so that exactly at the right at the right time you will decide which alternative you need to take okay in fact this is the reason for I mean fuzzy set theory based classification or fuzzy set theory based clustering is has become popular because of this particular reason you may you may not always want to force it to go to one of the clusters or one of the classes you may do it at the right point of time depending on the situation there okay so you can keep the overlapping information intact so that you will use it for future purposes depending on the problem and thirdly there is one more thing what sort of clusters are we going to get they are essentially again convex sort of clusters if if this a is the usual identity matrix okay then you will get basically convex sort of clusters again the problems regarding non-convexity and other things they need to be properly addressed in this context with this I stop okay do you have questions how to choose the value of our exponent I do not want to always I do not want to make a strong statement that R should be chosen always in this way or in that way I do not want to make some such statement what I can say is that people choose are sometimes many persons choose are in some some particular ways for example someone may choose R as something very close to one someone 0.0001 because you may want to get something very close to the usual came in algorithm as you can see if R is actually equal to 1 then this is like came in algorithm right the usual came in algorithm so that is why one of the reasons that people choose R to be very close to one is this they want to make this thing to be as close to came in as possible the usual came in algorithm so they may choose R to be a value that is very close to one if your question is what is the performance of this algorithm for different values of R how do you compare the performance of this algorithm for different values of R R has some impact naturally R has some impact on the final output R has some impact on the final output output changes if R value changes that is also true I think except that I do not want to make any statement on the I mean if R is very very large then what is going to happen so these things I do not want to talk about it now because it depends on the data set there are very many other things involved here so I do not want to make a statement about what will happen if R is very very large and very very small okay other questions this is one of the most popular algorithms in I mean by algorithms of fuzzy sets fuzzy seems algorithm many people apply this one in fact probably the word fuzzy is a misnomer see summation uij is equal to 1 for each point some of the memberships for different clusters is 1 then basically you are looking at a probabilistic setup right probability of the ith point going to the cluster 1 is some value cluster 2 is some value so some of this probabilities is equal to 1 so probably the word fuzzy is a misnomer because you are making it the summation as 1 so maybe that is also one of the reasons why many probabilists like this method many probabilists like this method okay because of that and it has nice mathematical proof though the proof is complicated I went through each and every step of the proof proof is good but it is a complicated one other questions