 So, to summarize the expressions of the linear decision boundary under the you know one of the simplest case of same diagonal covariance matrix as well as identical diagonal elements that means along every dimension you have the same variance. If you have a d dimensional feature vector then all the d different features have the same variance along their respective dimensions and the basically means the scatter and spread of color as a feature is the same as the height, same as weight as another feature, same maybe for a texture or something else okay. So in such cases let us look back to the expression of the linear decision boundary which is of course given by this expression where the two discriminant functions are identical with k0 equal to L is given by this particular expression which is a linear form and this we had done just seen that the expression of the linear decision boundary is also can be written in a crisp form in this pattern where the x0 actually is given by this expression and I left it this for your home assignment or take home assignment and you should be able to derive this based on the expression given here on the top. So that is finally here so you have this as the expression of the linear decision boundary where the w is given by this difference between the mean divided by sigma square okay. In the more general case when we had taken the covariance matrix as an identity matrix then of course it was just the difference of the two means because the sigma was equal to 1 okay. So now this is also as a little bit of a more generalization where the matrix is diagonal like an identity matrix alright but you have variances which is not equal to unity okay. So and they are same they are identical diagonal elements as given here so that is the same as the sigma so that is the one which will be sitting here and the x0 is as given here okay. So let us slide these expressions in the board because we will be comparing figures we had seen some decision boundary or examples of decision boundaries in the last class we will see a few of them more and we will compare two or three different cases of variance of the covariance matrix we will take an arbitrary covariance matrix where you will it will not be diagonal you will have off diagonal terms but what is the effect of class prior in such cases what is the effect of the covariance matrix being identical across different classes okay or if it is varying across you know it is not identical across different classes then what are the effects. So those will be the different cases we will study we will write the equations on the board for your understanding for the case a and then we will move on to the other cases what did we get as w it is mu k difference between two means divided by sigma square okay isn't it so this is what we have and what is the expression of x0 we got see originally this is should have been the perpendicular bisector so it is the point between the two means so it should be mu k plus mu l by 2 okay plus can you give that term well there should be a variance it will come here this is the factor which depends on the class priors okay so what is the what that expression okay can you just prompt it should be proportional to sigma in some sense or the variance okay then it is a vector it moves along the line joining the two means so you should have mu k minus mu l there is a reason behind this and it is norm so this is a unit vector well I am not putting away if you want you can have all of these vector signs okay then of course this should be easy to guess yeah there is a log of corresponding ratio of priors was it pk by pl or pl by pk okay let us have a look at the expression it is k by l okay k by okay so this is what we have to just remember and this is what we are considering as case a okay and in some sense if you want to have a two-dimensional diagram of this if there is a corresponding this if this is the mu class mean in two-dimension for class k if this is the corresponding mu l for class l and since the sigma is diagonal so if the isotropic lines which indicate the multivariate Gaussian function will be isocontour circles or hyperspheres so just one or two of them we have this diagram in the last class okay this is for one particular class the same thing holds good for the other class as well okay and what is that vector mu k minus mu l so this is that vector we will go back to the slides and check if this is correct or not so this weight is in the previous case is the most simplest case when we had taken this to be an identity matrix this term was not there so W was basically this and we also talked of this that this W is actually normal to the plane which plane the linear decision boundary in this case it is a line plane in three-dimension and hyper planes in this so the decision boundary will be at the decision boundary if you take this particular term here I will probably just change the position of this arrow so that okay so I will just put it here so this is this point say this is the this if you want to look at this term I should be having few color variants here so this particular term is it as this here so this is a point here this is the linear decision boundary which is basically orthogonal to this okay this is the point if the class means are same class perhaps I am sorry if the class perhaps are same then this term will vanish of course you can say that if this is also 0 it can vanish but that is no meaning because you know this are the diagonal terms in the covariance matrix and we are expecting some non-zero variance of course you know in the color we cannot have all the features same from different samples this is just a unit vector okay so basically say that I am moving up and down depending upon the ratio of the log prior's ratio here and a larger value of the class prior on the numerator will actually push it towards the L okay remember this is a vector so it is indicating this point so plus and minus in some order towards k l l and I think is there a negative sign here yes that is it all this is a negative sign so negative sign means if this is more it will move towards that side okay positive side will move it towards k okay and that is what we had seen with some of the example so we leave this with this diagram here and if you want to draw linear decision boundaries for cases when K and L are there you can have something like this or even for a large value you can have it like this okay that the DBM might come here if the WL is very very large compared to WK say 0.9 or more then the linear down boundary can even come here at this point okay it can pass to this point as well if the prior for the class K is more than okay so this is the these are some of the diagrams which we had seen in slides in the last class in two dimension we will we are going to see a few more of such cases today before we move on to the case where we have an arbitrary non diagonal arbitrary covariance matrix sigma even with off diagonal term so this is a special case where the covariance terms in the off diagonal terms for the covariance matrix are all equal to 0 that is why we could write the weight vector as mu K minus mu L divided by sigma square which may not be possible because you need to keep the covariance matrix overall in the expression which we will see but before that we will just have a look at some of the diagrams which are available in some public domain so this is one of the cases which we had just discussed remember this is the corresponding class conditional probability distributions for W1 and W2 two different classes which we considered as K and L and you can look at the two class mean this is similar to the diagram which I have drawn on the board here the corresponding isocontal lines of constant distances from the corresponding class means and this is the case of decision boundary with the two class pairs are the same one is 0.5 the other is also 0.5 so this is the very simplest case when the decision boundary is normal is a line normal to the line during the class means okay this is the same diagram when we are looking at this plane projection of these two distributions here the same things what we are seeing at this moment two distributions here to class mean vectors mu1 mu2 and the decision boundary orthogonal to this particular line okay this is the case which I was talking about at the end sometime back that if one of the class priors is made equal to 0.9 I hope you can see that and the other class prior is value equal to 0.1 so we had discussed this that the decision boundary will not now stay at the center point or midpoint joining the two class means it will of course remain perpendicular to the line during the two class means but it won't pass through this point it will pass through a point closer or moved in fact it has crossed beyond the mean for the class 2 and this is where the decision boundary and this is the case because the class prior for the class 1 is more than class 2 of course the another factor which you had seen in the expression is that the variance term okay the larger the variance term you have more is the shift or drag of this linear decision boundary towards one of the class means okay so it is so the amount of drift of this decision boundary will be depending on two factors one is the log prior another is the variance of the covariance matrix good so move on to the next case which is an arbitrary covariance matrix but still keep it identical for all classes okay that means we cannot now just simply say that we have a diagonal sigma 0 of diagonal terms we can't say that anymore we do have non 0 of diagonal terms as well as arbitrary sigma means that we do not have equal variance along the diagonal so we have sigma 1 square sigma 2 square and so on up to sigma d square and sigma ij typically will be non zero in general okay so in such cases but the covariance matrix is same for the class when we write the expression when we write the expression for the discriminant function for a particular class gix you can notice here that the class mean term mu y is available here as the case earlier and then we had the log prior term as well but we don't have the subscript of the covariance why don't we have it here because this sigma is identical for all classes so if you have this expression and we can remove the class even quadratic term which we have done so far that means the corresponding sigma inverse x square so we will be basically left with the linear terms here okay in fact this is the linear term as a function of x and we have two constant terms which depends on the corresponding covariance matrices in fact for the individual discriminant function for a particular class you have sigma i is here you have the log prior term but you do not have the subscript for the covariance the reason is the same it is identical for all classes and this gives us the linearized model for the discriminant function of course we are going to write the decision boundary very soon which will be dependent you know very much is the same this term is written from here this particular term multiplied by x so the corresponding weight vector is this matrix of course the transpose sign is anyway here and the corresponding the other two terms which are here are all coming into the bias term okay so this is possible to linearize as long as you have an arbitrary sigma which is identical for all classes again I did mention in the sometime back as well as in the earlier class it is a unrealistic assumption what you are making an assumption is that that the variance of color of a certain flower is same as the fruit if fruits and flowers are two different classes okay the weights vary by a similar amount and so on and so forth okay so this is what are the expressions of my gix and the linear decision boundary correspondingly using the same as we have done earlier is given by this particular function where k and l are two different classes well you can write in terms of i and j also if you like but there are two different classes here and so you will have where each of the w i's from the previous slide is the covenants matrix multiplied by the class p or the class mean pre multiplied by the inverse of the covenants matrix and this is the corresponding bias term and if you substitute it here this is what you will get in fact you should be able to rewrite this particular term difference of the two bias as a function of the weight vector here okay and again I leave this as an exercise for you we had written this earlier for the case when we had a diagonal sigma you can see that the expression is actually at is the same and we are going to write this again on the board and if you remember that this is the midpoint during the two class means minus this factor is the same except the sigma square is replaced by the inverse of the covenants matrix okay that is what you have here and what you have is basically the this is the vector which so that you are moving the decision boundary moves along the line during the two class means okay so that is what this means is the covenants matrix is now arbitrary it is no longer diagonal like the case a but it is identical for all classes so it is identical for all classes that is what the word identical means here and the corresponding expression is the same overall expression does not change it is still a linear decision boundary and where the W will be given as inverse of the covenants matrix divided by this okay and the corresponding x0 and a little bit of space there that is okay the transpose should be here correct multiplied by the log okay so you can now compare the two expressions here okay this is what the case when it was diagonal very strictly the numerator remains the same the denominator is changed by the inverse of the covenants matrix the small correction what the small correction here so so notionally it is the same okay the the inverse of the covenants matrix will give you 1 by sigma square if it is diagonal and that goes to the numerator here so case is a simplified version of the case B okay and if you want to draw the diagram here in a similar manner like the one so I am not drawing the isocontrol lines will have slides coming up for that so I am just for the sake of simplicity say this is my class mean mu k and that is my the another class mean mu l for the two classes k not equal to l and you have let us first draw this just the numerator term without the covenants matrix which is mu k minus mu l so what will happen is okay I will put it as a dot okay so this is basically my mu k minus mu l this particular term here which was the same as this there was just a scale factor earlier so this is the same vector which we have drawn for the simplest case a is the same as the numerator now okay without this term okay is the is so that the dashed line with the arrow basically gives this term and if this is a vector multiplied by the inverse of a covenants matrix we know that the covenants matrix is a symmetric matrix okay symmetric matrix so I leave it as an exercise for you to visualize that this will actually create a sort of an affine transform okay for the corresponding vector so if this is of course an orthogonal matrix then it could have meant a rotation okay but this is also a d dimensional rotation combined with an translation or scale so this basically means an affine transformation in some sense you can visualize that the W now which was actually here okay not to scale of course will what will now happen is that the W will get tilted okay and it will probably give you a vector which would be something like that okay so that is my W the amount of tilt of this W with respect to the line during the class means will actually depend on this the covenants matrix which is arbitrary here but you can say this is some sort of tilted one now what will happen to this point is the same effect which we had earlier because in general if this term vanishes again this term could vanish if these two class priors are same then we are talking about the point which is passing through so if you are looking at this particular term here again and let us say that this particular point is this midpoint which is the midpoint between the two class means so that is going to remain the same and what will happen is one thing you must be keep in mind always that this W vector is normal to the linear decision boundary and if it is long normal to the linear decision boundary I have put that as a dash shines I am going to keep that consistently here maybe I am going to use a slightly different color okay so this is what could be your DB the linear DB which is orthogonal to the W okay passing through the midpoint between the two class means if this term is equal to 0 which is the same concept which we discussed earlier and if this term is non-zero what will happen is this is bound to move along this vector because you have a vector dictated by this so it is going to move along this so what will happen is you can draw different types of decision boundaries here all of them will be orthogonal to the W or strictly speaking W will be normal to the DB all these DB's which will appear due to different values of this particular term which will actually shift this point along which is shift this point along the line joining the two class means okay so this point might shift from year to year is where your x naught will occur okay so these are potentially some other points where this value might but at that point also the normal will still be the same W it does not change this does not change depending upon this particular term the class part is not affecting the W so the normal is basically the same the linear decision bound is just parallel to that and it keeps shifts up and down here along this depending upon okay so okay you can so let us look at some of the examples of so this is the expression if you look back once again that we are writing in this particular form the linear DB is basically inverse mu k minus mu L yeah that this is how why it will give you know okay this is the expression of the weight vector and so it is just just the transform line joining the two means so this is the line joining the two means mu k minus mu L as it is transformed by a corresponding symmetric remember the inverse the sigma or the corresponding covariance matrix is symmetric the inverse is also symmetric and the it just can be considered a tilted rotated vector joining the two means I have a simple example of this in two-dimension that means if the covariance matrix is a diagonal term with non-diagonal diagonal elements remember this is not arbitrary still diagonal but for sake of analysis we have just taken a diagonal matrix but with unequal elements so this is somewhere between case A and B remember case A we had diagonal but identical elements here it is identical over classes arbitrary so this is somewhere in the middle this is diagonal but we have non-diagonal diagonal elements sorry non-identical diagonal elements as given here okay so the corresponding weight vector will be this I leave it to you as an exercise and the direction of the corresponding DB will be this because this is the W which will not be the line joining the two means so the W will be a vector which is this is the vector join the two class means so we have a tilted vector which is orthogonal to DB and this is an example of the direction vector of the decision boundary which is going to be normal to W okay so these two are orthogonal vectors okay so the line joining sorry the decision boundary is marked here in the graph which is basically this vector okay so just to yeah this is a special case again carry on with another simplest of the most simplest example but all diagonal elements equal to 1 if this is arbitrary so now what we are talking about is a special case where the diagonal elements whether the covariance matrix is of is arbitrary but the variance terms on the diagonal equal to 1 okay so the sigma square sigma 2 square sigma 3 square along the diagonal equal to 1 you have an off diagonal term since it is a 2D case it is a simple 2 cross 2 matrix so it is 1 along the diagonal and off diagonal terms are equal to sigma okay so in such particular in such a case you should be able to obtain the value I leave this as an exercise for you to derive the equation for the case when W is equal to this D indicates that it is a diagonal elements are equal to 1 okay that this subscript does not indicate that I mentioned it indicates the case when diagonal elements are and you can derive this very easily so this is the sigma inverse which you have and the corresponding terms are given here okay so these are certain examples of cases of arbitrary sigmas which are diagonal in all cases with increasing feature 2 versus feature 1 so you can see that the decision bond is no longer orthogonal to the line joining the 2 class means here it is almost diagonal but still the variance along the y direction or feature 2 is more than 1 so you can see that it is no longer a sphere it is becoming electrical and it is already tilted it is tilted more here in this particular case because the variance is much much more higher this is the case when the variance along x is higher than the variance along y okay and the DB is not orthogonal this would have been the orthogonal line during the 2 class means so this is a tilted case here this is also tilted as well okay so these are not orthogonal remember this figure is in 2 dimension we will have a look at this figure where this is obtained from e document from the web where we have these 2 class means let us say it indicates apples and oranges and you can see here that the variance for the apple along the color dimension what are the 2 features selected here weight and color are the 2 features for apples and oranges the variance of color for apples is large as well as those for oranges whereas the weight vectors have a lesser degree of variance so if you take the line during the class 2 class means here and the decision boundary would have been orthogonal by a line joining as shown by there my moving cursor at this particular point so the line would have been here if we had circles as my isotropic lines of Mahanabhis distance from the center but in fact it is not so what this point P means is that this point P may appeal larger by Euclidine distance from this class mean for apple compared to those of oranges you can see that this distance is much more than the distance of P to the midpoint for oranges but based on Mahanabhis distance criteria these are equal and so this is said equal to 1, 2, 3 and 4 the same case 1, 2, 3 and 4 so in some sense that this is the distance to the 4 so the discriminant function evaluated P is smaller for class apple than it is for class orange why because this is the value this distance is d is equal to 4 for oranges this is d equal to 4 let us say for apples so this distance here is much more than a value 4 so if you observe this very carefully that I repeat again that the distance of this point P in the Euclidine sense is much more from the class mean then for the mean corresponding to oranges however if you take Mahanabhis factor that is the covariance matrix into account then this distance is much more than this distance here in fact this is at the same distance as here so you can see the why the decision bond is getting tilted because this is the points where my Mahanabhis distance from the two means should be identical these are the points where the means are identical and if you observe the same for 3D figure with the corresponding class conditional priors class conditional probabilities shown here and this is a case where the decision monetary has shifted away from the midpoint during the two class means towards one of the other class means because the class prior again for class 1 is much higher than the class prior for the other class which is 0.1 so this is so large that it has not only moved towards the class mean but gone further on so you can see the effect of the class means and the tilt in the decision boundary due to an arbitrary covariance matrix which can take place if it is not diagonal as long as it was diagonal and it was identical elements we had always the constraint that the W weight vector was the class was the line during the two class mean and the decision boundary was like a perpendicular bisector but not anymore when we have arbitrary covariance matrix the decision boundary will be a tilted vector the corresponding normal to that also will be the tilted vector which is the vector joining the two class means so we had seen that we will move to the last place where you have an arbitrary sigma and all parameters now are class dependent okay that means in the previous this is different from the class B the class B also we had an arbitrary covariance matrix but you had sigma to be identical overall class now if you take flowers and fruits so flowers will have a sigma 1 as a covenants matrix fruits will have something else okay. Let us look at the expressions this is the expression for the discriminant function and now you had to put back the index here which we could avoid earlier for the cases A and B okay and when we open up and write this expression in this particular form as a sort of a non-linear discriminant function for the first time you were having that then the corresponding weight vector here this is the quadratic term which we cannot eliminate anymore because there is subscript i which is just the inverse of the inverse of the covenants matrix this is just a factor which is used and then the other terms are same these two terms is what we already had in our previous of course with minor variance here because earlier this was arbitrary as well but the subscript was missing earlier okay. So in general the discriminant function based on this expression of gi and the relation boundaries which will be obtained by taking the difference of gi for knl are all what are called in general as hyper quadrics and the distance boundary can be obtained by using this expression. So let us look at a few cases of the simple case this is an assignment which I am leaving for you which I have been picked up from the book by Duda Hart on pattern recognition. So these are the corresponding class means for two this is two class problems so these are the two class means and these are the corresponding covenants matrix well we have kept them identical sorry we have kept them diagonal we have kept the covenants matrix diagonal but they are not same for both the classes okay. So the off diagonal terms are 0 alright for the sake of simplicity so that you can work it as a home exercise using pen and paper but the main important fact is they are not identical okay. So I leave it in exercise that you can compute the covenants matrix inverse from the first one as this it varies because it is a diagonal one so it be one by the element so these is a very extension of the we talked about this in the discussion on linear algebra and vector spaces when we had seen that the inverse of the diagonal matrix is just the elements of assume similar class priors and I leave this in exercise for you to write the expression of the decision boundary using these two class means and these two covenants matrix what is the formula we will use we will go back this is the individual expression for GI. So use this expression of GI take GK equals GL and take expressions of you just need the covenants matrix you not just need the class means and the class priors they are all given in the next slide here is what you have for the class priors here you have the class means if the covenants matrix use the previous one this is solved out problem in the book by Ruder Hart you can refer to that for the answer. Quadratic decision boundaries it is very nice to note that in d dimension in general it has an expression of this nature we will come to those expressions very soon for the KC and these is responsible that two set of terms which are responsible for quadratic the terms this is the linear term and in a special case when the dimension is 2 you can actually open up and write this expression this particular formula terms you have 1, 2, 3, 4, 5, 6 you can count for d equal to 2 the number of terms here will be equal to 6. For a generic case when the dimension d is equal to 2 this is the expression which we have so these are the bivariate terms bivariate coefficients and the individual variances in the above equation total number of parameters which you have is this why do you have 2d you have d plus d here the d terms here d terms here so that is 2d plus 1 here and if you look at the number of terms here it is basically a summation I will leave it as an exercise for you that this is what you will get it is a total number of terms is given here or you can write it in terms of this. So you need to organize them in this matrix because this is the expression we got from the linear from the non-linear relation boundary when we put gk is equal to gl or the expression of the gi itself was in this particular form when we are talking about an arbitrary sigma not identical for all classes. If you observe this expression here in equation number 3 the w has d terms with w this one this matrix is a linear vector containing d so this is 1 w ij is a d cos d matrix and what you need to do basically to match the terms is that you need to duplicate so it is a symmetric matrix w here and the off diagonal terms which is d square minus d is obtained from these matrix. So this w ii are these elements which sit in the diagonal of this capital W here and the w ij are duplicate it by making half each and the sit in the off diagonal terms of w so this is how you construct this expression of the quadratic surface in d dimension or a hyper quadratic surface in d dimension from an arbitrary polynomial nature of the expression as given here in d dimension this is geometric this is a vector representation. So in this equation which we had our gi this is the symmetric part of the w which contributes to the quadratic terms and it generally it is called a sort of hyper hyper volatile surfaces in higher dimension 3 or higher dimensions is what you will get of course if w is equal to an identity matrix then you can get spheres or hyper spheres if w is equal to a null matrix you just have linear decision boundaries then you have lines planes and hyper planes before we move ahead with some examples. So what do you have for case c sigma is arbitrary and unequal that means non identical for all classes what was case a this was diagonal the simplest case of course the most simplest case which are not written on the board was the case when this was an identity matrix and then the this we consider two classes back so but this is again considered as a special case of a when this is an identity matrix where you get actually Euclidean distances from the class means here you have this is to be diagonal and identical class b this is arbitrary but identical over all classes sigma is arbitrary and unequal over all classes. So now we have the generic case where we have unequal diagonal elements arbitrary of diagonal elements and they are not similar over all classes that means the features over fruits are different from those of flowers over something else say say animal faces or something else in terms of image signals if you if you are talking about recognizing patterns or classifying or clustering patterns. So in this particular case the expression so I am this is I am ignoring the subscript i is there a plus or minus w transpose. So you need to have a transpose here because it is dot product of this that is fine and for a for an arbitrary ith class discriminant function you have to put an i here as well as here. So I am just ignoring that particular one for the time being where this one is covariance matrix inverse multiplied by no that is all divided by 2 with a minus sign that is what is your the quadratic term the linear term here you will be covariance matrix inverse multiplied by the class mean and correspondingly w0 will be is this a log prior yeah it is a log prior minus half there is a minus sign correct minus sign in the first. So as you see here that those are all these are all special cases of that in fact you should be able to map this to this especially for the linear terms if you look at this this is basically this in terms of the decision boundary I have not written in terms of decision boundary that you can do because what you will have here is a wk inverse minus wl inverse here you will have sigma i inverse mu i minus sigma j or k knl. So I will just write one particular term so if you look at from the decision boundary point of view so you will have this this is what it will look like because you are talking about gk minus gl equals 0 so this is what you will have of course if the covariance matrix are identical then you can go ahead and write like this there is a special case because I can take out that is a common factor and the same thing holds good here as well. The log prior term which looks different well in the true sense of the term this has no meaning in case of quadratic one because first of all the what you are going to have as line surfaces and hyper surfaces so this is a linearizable form of this particular one. In fact yeah so at least the weight vector is the one which you can simplify from the linear term and the there is a quadratic term which is coming out from here and sitting here as well which was not there in the previous case. So in general so if you have scenarios where typically let us say this is a mu k and mu l will have these diagrams drawn on the board so it is possible that you may have something like this and you may have then something like that arbitrary covariance matrix is here. It is anybody's guess to what the decision boundary would be but it is very unlikely that it is going to be linear because it seems there is an effect of both are unequal first of all the diagonal terms are not same and there are off diagonal terms as well so it is possible that it may be something like that it depends upon this it is anybody's wild guess it is not often easy to visualize even in two dimensions unless you have seen such examples you could observe a few of them in the slides now. So this is a case which we are observing in 1D and remember the decision boundary in 1D is still a point decision boundary is still a point and if it is a point in all other cases like the simplest cases of A and B it is a point which divides this 1D space into two parts for class A and B or class K and L or class 1 and 2 what do you think will be a nonlinear decision boundary in 1D. The learn it should be a point but in fact it will be more it may be more than one point it will be more than point it is something like an intersection think of an intersection what is a point intersection of two straight lines I will give a geometric interpretation a point is a intersection of two straight lines a typical example is intersection of the weight vector and the decision boundary okay the perpendicular bisector at linted one does not matter intersection at a point think now the intersection of a curve on a straight line nonlinearity you have to be a nonlinearity somewhere okay because after all it is in 1D and think of a nonlinear curve which actually represents so it will intersect the straight line at two points so you can observe interestingly here you look at the two distributions in 1D okay what is the covariance matrix here well you have just one dimension so you have basically one variance term you have just one variance term and these are the two class means and the two W1 W2 1 W1 that means the class variance okay for class 1 is much more than the class 2 the priors also have a role to play okay which is not given here so you can see here that look at the region this is the region R2 because the class conditional prior for class 2 is more than class 1 look at base theorem you need to assign it to a class think of this decision rule you assign it to a class wherever the class conditional probability is more than the is more for one particular class so here the class 2 wins this is here the class 1 wins here also the class 1 is has a larger probability function distribution than the class 1 so this region belongs to R1 this belongs to R1 in between you have the region R2 the same thing holds good here this is a case of completely overlapping classes to the right the two class means are identical you should actually make lot of errors whatever classifier you design you are going to make in general on average 50% or more errors in classification but let us look at the relation boundaries here okay by the concept so look at so from R so here to here maybe the red mark is not that greatly done so okay or there could be an effect of class prior so from here to here you have the conditional density function or probability PDF of class 1 more than W so this region belongs to R1 but here you and both sides you have region R2 which wins so you have two points as a decision boundary these are showing examples look at the figure at the bottom this is similar to utmost whatever is drawing so you have class means for two classes here and you have well this seems to be diagonal this seems to be diagonal covariance matrix is X and Y because these are not oriented gaussians these are just ellipses as you can see here in 3D the corresponding isocontal lines is given here and look at the parallel relation boundary here for the two covariance matrices are given in this particular case that is another example of tilted gaussian distribution functions the two class means you look at the two isocontal lines now you have a case of oriented one of them is oriented the other is not so it is possible that you have off diagonal terms which is 0 here for class 1 here you have the off diagonal terms which are non-zero for class 2 because it is tilted this is just oriented this is tilted you know sorry I should be correct I should be correct here this is called an asymmetric gaussian as an ellipse both are ellipses mind you isocontal lines are ellipses but this is asymmetric gaussian this is a tilted gaussian oriented gaussian sort of thing if you look at these two both are ellipses and but they are not tilted they are just ellipses one of the one is the major axis so in this particular case also you will have an oriented decision boundary as given here you will not have a non-linear decision boundary. So let us look at this example of non-linear decision boundaries of a two class problem in two dimension the gaussian functions show the corresponding distribution functions of the two different classes which appears like a mixture of gaussians but they belong to two different classes you look at the bottom of the figure here the isocontal lines are shown the two class means the covariance matrix is strictly diagonal with the off diagonal terms equal to 0 it is a 2 cross 2 matrix since it is in 2D let us look at the case for the class 1 the corresponding diagonal terms are same giving us circular isocontal lines which would have been spheres in 3D or hyperspheres in a dimension let us look at the distribution of the covariance matrix as given for the class 2 and here the variance along the x direction of the first dimension is more than that of the second remember the subscript indicate the class ID which is class 2 so this is for the class 1 where the variance are same here in the case of class 2 the variance along x is more than or standard deviation to be strictly speaking you can take them to variance as well by taking the square x is more than 1. So this is a noise scenario where you can see that you have two discriminating decision boundaries which create what are called a situation of a hyperbola or a hyperbolic decision boundary in 2D they will create hyperboloidal surfaces or hyperbolic surfaces in 3D and even higher dimensions as well. So if you look here in between you have the class assignment for R1 but if you are here or there you need to assign it for class R2. So this is the G1 this is the corresponding G2 G1 minus G2 will give you this sort of decision boundaries again a simple case the variance for the class 1 along x direction which is this one is the same as that of y for the second class that is okay. I repeat again the variance along the diagonal term for class 1 along x direction is the same as for the second class in y direction so assuming this is x is the first direction and y is the second direction look at the other one here the variance along y for the first class is same as the variance along x for the second class. The off diagonal terms are 0 indicating it is a diagonal covenants matrix and look at the means the they are shifted along x by the same. So basically if you look at the positions of the class means the difference in x and y directions have to be same to create what are basically called a sort of decision boundaries which are asymptotical cases of ellipses okay and although the individual isocontrolines are elliptical for both class 1 and class 2 but they are in different directions that means they are oriented orthogonal to each other and look at the two decision boundaries which is the same which we had earlier for the case of ellipse. And so when you are in this region or this assigned to class 1 when you are here or they are assigned to class 2 okay. So this is a case of special case of hyperbolic decision boundary when this the two corresponding boundaries come very close and the decision boundary which you observe here is a special case of an hyperbola called a rectangular hyperbola. The here the boundaries appear to be linear but it is a very special geometrical case of hyperbolic nonlinear decision boundaries which degenerates into this particular form of two intersecting lines or a pair of intersecting lines which is called a rectangular hyperbola. So we will stop here the discussion on discriminant functions borrowed from the concepts of Bayes decision rule the Bayes theorem and the Muldron variate density function which gave us the Mahanubis distance criteria and from there we saw two cases which provides us with the linear decision boundaries in the last case with arbitrary sigma different for different classes they are not the same anymore class wise provides nonlinear decision boundaries in general that is what we have we showed the examples in 2D and you have to visualize how things will happen in three dimension or even in higher dimension. Thank you very much.