 I was mentioning about the different types of errors okay and let me give you the corresponding mathematical formulation of that let me give you the corresponding mathematical formulation. So how does one give them mathematical formulation it goes like this say you have let us assume that the number of classes is 2 number of classes is 2 and the conditional probability density functions for the class 1 is let me call it p1 and for the class 2 it is p2. So for a point x it is p1x and for a point x here this is p2x I am assuming that x is belonging to Rn and the prior probabilities this is capital p1 and this is capital p2 and p1 plus p2 is equal to 1 these are prior probabilities okay. Now let us take omega to be let us take omega to be the set of all possible values of the features you have capital N number of features the set of all such possible values that this can take so this that set can be either Rn or it may be a subset of Rn either it is Rn or it may be a subset of Rn okay. So for the sake of convenience I am just writing this to be subset of Rn that means the places where x is not belonging to omega then p1 is 0 and as well as p2 is 0 is it clear to you the places where x is not belonging to omega that means if x belongs to omega complement then p2 is 0 and p1 is 0 then I can as well write x is belonging to Rn no problem okay. Omega is the set of all possible values of the feature vectors omega is set of all possible values of the set of all possible values of N dimensional feature vector that is omega so omega can be Rn or omega is a subset of Rn now what is the meaning of making a decision the meaning of making a decision is we have to divide omega into we have to make a partition let the partition of omega let me just write it as omega 1 and omega 2 let me just write the partition of omega as omega 1 and omega 2 that means omega 1 union omega 2 is equal to omega 2 omega 1 intersection omega 2 is equal to null set and 3 omega 1 is not is equal to 5 and omega 2 is also not is equal to 5 these two are non empty sets the intersection is null set and the union is partition union is the whole space and what we will do is that now we will make a decision the decision let me just represented by the decision let me just represented by d omega 1 omega 2 the decision I am just representing it by d omega 1 omega 2 what is the meaning of this the meaning is x belonging to omega 1 implies we will put x in class 1 we will make a decision that it is going to class 1 x belonging to omega 2 implies we will put x in class 2 that is the decision is this clear to you we have made a partition and then for a partition we have made a decision like this now note that this whole thing is decision that means d omega 1 omega 2 means actually this the decision d omega 1 omega 2 means actually this note that d omega 1 omega 2 is not is not equal to d omega 2 omega 1 why because here if x is in omega 1 we will put x in class 1 but here if x is in the first set we will put it in class 1 the first set means class 1 second set means class 2 so naturally omega 1 omega 2 is not is equal to d omega 2 omega 1 first set is always for class 1 second set is always for class 2 so d omega 1 omega 2 is not is equal to d omega 2 omega 1 okay now let us look at this set d the set d is all such d omega 1 omega 2 all such d omega 1 omega 2 that means we are looking at all possible decisions not even a single decision we are not considering every decision we are considering every decision we are considering when I say that this is d is the this is the set of all possible decisions is the set of all possible decisions is it clear to you this is the set of all possible decision every decision is here now for a decision let us consider a single decision let us consider a decision let us consider one decision d omega 1 omega 2 this is one decision d omega 1 omega 2 now what can happen we this one this is the decision d omega 1 omega 2 that is here x is put in omega 1 here x is put in omega x is put in class 1 x is put in class 2 this is the decision this side this is actual this is actual that means here x is from class 1 x is from class 2 let me repeat if you take a point x it may be actually from class 1 or it may be actually from class 2 now your decision it may make that x may be put in class 1 or x may be put in class 2 it is clear now what happens is if x is actually from class 1 and if you also put it in class 1 then there is no harm similarly if x is from class 2 and if you put it also in class 2 then there is no harm but if x is from class 1 and you will put it in class 2 this is an error okay this is an error similarly x is from class 2 and you will put it in class 1 then this is also an error is this clear so for every decision d omega 1 omega 2 you have this now if you have three classes for two classes you have two such places where you have error then if it is three classes at how many places you will have errors six is this clear it is for two classes it is two square minus two two for three classes it is three square minus three six for four classes it is four square minus four four into three twelve right and if it is k number of classes the places where you can have this error is k x k-1 okay at k number of classes the places where you can have error is k x k-1 so then what is the mathematical formulation of this till now it is fine but how does we calculate this error well so these are the possible places now let us look at the mathematical formulation this is when x is from class 1 we will put x in class 2 so how does it happen x is from class 1 and we will put x in class 2 we will put x in class 2 means x must belong to omega 2 x is from class 1 but we will put x in class 2 that means x is class 2 means only when x is belonging to omega 2 we will put it in class 2 that is integral over p 1 x dx over omega 2 okay this corresponds to this if I write it as star this corresponds to this star and if I write this as alpha this is x is from class 2 but we will put it in class 1 this is your alpha and this naturally depends on what this omega 1 and omega 2 are for a partition or for a decision rule d omega 1 omega 2 this is one error expression for one error this is expression for one error okay now but what is the probability that x is from class 1 the probability that x is from class 1 is p 1 and the probability that x is from class 2 is p 2 and so the total error is if I call it as a epsilon and this is dependent on omega 1 and omega 2 this is actually p 1 this is the total error probability this is error probability this is also known as probability of misclassification this is also known as probability of misclassification I would like to make a small remark when I wrote this expression I am not bothered about cost of misclassification I am only bothered about probability of misclassification that is if the point from class 1 if it is put in class 2 what is the cost of misclassification that is the cost of misclassification can be enormous whereas probability may be small or probability may be high misclassification may be I mean the cost may be small that is basically I was telling you that for different error probabilities are for different errors whether you will give equal weight or same weight it depends on the problem under consideration here I am assuming same weights I am not bothered about the cost of misclassification or I am assuming that the cost of misclassification is same I need to multiply this thing by the corresponding cost of misclassification I need to multiply this by the corresponding cost of misclassification which I am not doing it that means I am assuming that the misclassification costs are same I will if I get time I will discuss about this cost of misclassification but here you can assume that the cost of misclassification is same when it is different then how does one look at this problem this is given in you can look at Fukunaga's book on statistical pattern recognition where the expressions for cost of misclassification they are also given there you can look at Fukunaga's book on statistical pattern recognition introduction to statistical pattern recognition I think it is a academic class k Fukunaga f u k u n a g a k Fukunaga now let us see how to minimize this. What is the meaning of minimization the meaning of minimization is we need to get hold of that specific omega 1 and omega 2 for which this value is less than every other such decision the meaning of minimization is the meaning of minimization is to find omega 1 0 omega 2 0 such that epsilon of omega 1 0 omega 2 0 is less than or equal to epsilon of omega 1 omega 2 for all d omega 1 omega 2 belonging to we need to find omega 1 0 omega 2 0 such that the error corresponding to omega 1 0 omega 2 0 0 is for actually optimal O for optimal okay epsilon omega 1 0 omega 2 0 the error is less than all errors less than or equal to then we say that this is optimal such decision rule then we say that this d omega 1 0 omega 2 0 is optimal decision rule then d omega 1 0 omega 2 0 is optimal decision rule optimal decision rule now how does one find it so this is the expression so what I will do is I will just add I will just add p1 p1 x dx over omega 1 and I subtract the same quantity I subtract the same quantity then what will happen look at this term and this let me just write it as 1 this is p1 p1 x over omega 2 this is p1 p1 x over omega 1 so you should just take the addition this is just going to be over omega 1 union omega 2 okay now let us look at this and this plus this is integral over omega 1 this is integral over omega 1 p2 p2 – p1 p1 x dx so what is this note that p1 is probability density function defined over the whole space omega so the integral is 1 so this value is actually p1 plus so similarly instead of adding and subtracting p1 p1 x over omega 1 if you add and subtract p2 p2 x over omega 2 then what you are going to get is this will be p2 plus integral over omega 2 this will be this will be p2 plus integral over omega 2 p1 x p1 this is capital p1 and this is capital p2 multiplied by p2 x is p1 p1 – p2 px now we will add this and this if we add a and b then epsilon and epsilon this will be 2 epsilon p1 plus p2 is 1 p1 plus p2 is 1 plus you have this and you have this so we need to minimize this we need to minimize epsilon omega 1 omega 2 is same as we need to minimize 2 epsilon omega 1 omega 2 1 is the constant so we need to minimize this as well as this we need to minimize this as well as this right now let me write one set let me just call it as C1 C1 is the set of all x and C2 is and I am writing one more set C3 is now we need to minimize this and we need to minimize this that means we need to choose omega 1 and omega 2 in such a way that this whole expression is minimized and this whole expression is minimized now how does one choose omega 1 and omega 2 now let us look at the meaning of minimization this particular quantity p2 p2 – p1 p1 depending on where x is located it may be greater than 0 or equal to 0 or less than 0 now since we need to minimize this whole thing what one can do is that let us look at all those places where this is less than 0 and add them up let me tell you once again let us look at all those places where p2 p2 – p1 p1 is less than 0 and we will add then you will get the maximum I mean you will get the minimum possible such quantity here is this clear similarly let us look at all the places where p1 p1 is less than p2 p2 here and integration means basically summation we will add all possible such negative quantities here and as well as here then that will make this thing to be the minimum most that will make this one to be the minimum most so that means let us find out let us look at the set C1 p1 p1 is strictly greater than p2 p2 p1 p1 is strictly greater than p2 p2 then if we take C1 here this whole summation will be minimized let us look at C3 p1 p1 is strictly less than p2 p2 here then we can take C3 to be ?2 but then what about C2 p1 p1 is equal to p2 p2 note that if p1 p1 is equal to p2 p2 whether you keep it here or here the difference p1 p1 – p2 p2 corresponding to that that will be 0 so it is not adding to anything are you understanding this is not adding to anything so it does not matter whether you keep it with the first set or the second set or a part of C1 you keep it with first set a part of C and a part of C2 you keep it with first set and the rest you keep it with the second set no problem you can put the whole of C2 either with C1 or with C3 or a part of C2 you can put it with the first set and the rest you can put it with the second set in any way you can just do it it is not going to make any difference to the quantity of error value so the optimal set ?1 0 ?1 optimal without loss of generality let us take it to be C1 union C2 and ?2 0 let us take it to be C3 this is without loss of generality without loss of generality that means there are many optimal decision rules there can be many optimal decision rules all of them giving the same value of the error probability we can take any one of them and every such decision rule is actually called as Bayes decision rule this decision rule is Bayes decision rule and it provides you optimal decision optimal from the point of view of minimizing the probability of misclassification minimizing over what minimizing over every decision rule whatever decision rule that you will take its probability of misclassification will be greater than or equal to that of the Bayes probability of misclassification whatever decision rule you take note that our ?1 and ?2 I have not put any condition any decision rule you can take no problem whatever decision rule that you take its probability of misclassification will be greater than or equal to that of the Bayes decision rule this is Bayes decision rule I mean these are the places where P1 P1 is equal to P2 P2 that is basically the decision boundary between the classes one side it will be greater than another side it will be less than the boundary will be P1 P1 is equal to P2 P2 okay most of the cases the boundary region will be P1 P1 is equal to P2 P2 and these are the this is Bayes decision rule this is the Bayes decision rule D ?1 0 ?2 0 Bayes decision rule this is D ?1 0 ?2 0 now note that here this decision rule is given for two classes you can have the corresponding thing for 3 4 5 in fact any number of classes in fact in this one there is also some set of exercises okay you may take this thing as an exercise to see whether you can solve it or not the exercise is for 3 class classification problem for 3 class classification problem for 3 class classification problem you derive the decision rule which minimizes the probability of misclassification you derive the decision rule which minimizes the probability of misclassification and the decision rule will be like this for 3 class classification problem the decision rule will be like this if you have 3 classes your conditional probability density functions will be P1 P2 P3 and your prior probabilities will be P1 P1 P2 P3 and the decision rule the optimal decision rule if x is belonging to this you will put it in class 1 x is belonging to this put it in class 2 x is belonging to ?3 put it in class 3 so what is your ?1 O ?1 O is the set of all x P1 P1 x is greater than or equal to P2 P2 x and P3 P3 x P1 P1 x is greater than or equal to P2 P2 x maybe I will just write clearly and okay now ?2 0 the set of all x for which P2 P2 x is greater than or equal to P3 P3 x and P2 P2 x is strictly greater than P1 P1 x and ?3 0 is the set of all x so it basically follows whatever we have got in the 2 class case here also P1 P1 x is greater than P2 P2 greater than P3 P3 whereas since I would like to make the union to be the whole space so I included the equality part here I include the equality part but at P1 P1 if it is equal to P2 P2 and it is also equal to P3 P3 that is here I include the equality part here and for P2 P2 and P3 P3 this equal to I have included here and here this is strictly inequality so that I made this thing the union to be the whole space the union to be the whole space generally what one you will see in the books is that at least for one of the classes the strict inequality should hold that is what you will see in books that means at least for one of the classes strict inequality is holding mean wherever all of them are same they are not putting it into any class whenever all of them are same then they are not putting them into any class that is fine there is in principle there is nothing wrong there in principle there is nothing wrong because they have not put the condition that is exhaustive you need not have to classify each and every point if there is some some such confusion they have just kept it like this here I have put all the points in either first class or second class or third class except for that small thing which is of negligible difference this is the rule the same rule the similar rule you can extend it to m number of classes which is what I have shown in the slide to you for a capital M number of classes the basic thing is that you put it in you put x in class I if P I P I x is greater than or equal to P j P j x for all j not is equal to I this is the general thing that you will find in books P I P I x greater than or equal to P j P j x for all j not is equal to I we are all we have our background is basically computer science we must have looked at many cases where you have ties right whenever you have written many algorithm many places you must have got some ties so for ties generally we have some sort of a compromise okay here also you need to have some such compromise because ties are not making any difference to the misclassification probability. So wherever you would like to put them you just put it does not matter okay I think I will stop here because