 to today's class. So we are in module 3 and we are trying to understand about image classification. And the tagline of this module is, if I cannot see the features, how do I classify them? This is with respect to microwave imagery. So the contents of module 3 have been discussed as part of previous lecture, but just to reiterate, we are trying to learn classification of SAR intensity images, as well as we will try to understand about different types of classification, supervised as well as unsupervised classification and we will also get introduced to fuzzy classification, okay. So as part of previous lectures, we have a basic understanding about what exactly is classification isn't it? Because it consists of two steps, number one, recognizing or identifying real world objects. Now by real world objects, I mean water body, it can be vegetation, it can be building that is built up area and so on. And the second step in classification is labeling of pixels to be classified. Remember the example I showed you in the previous lecture where we discussed that labeling in literature can be conducted by two main processes that are supervised and unsupervised methods. We will be discussing more details about that shortly. Now just a quick recap, that is we covered till now the geometrical basis of classification, okay, geometrical basis of classification and we also learned what is meant by minimum distance to mean classification and the parallel pipette classification. We discussed that in minimum distance to mean classification, each time the distance in Euclidean space shall be calculated between the pixel which is to be classified and the center or mean of a cluster or group of pixels which are close together in Euclidean space and which form part of a cluster, okay. Now we also discussed that when we use the distance formulae, the advantage is that you know it is simple, it is easy to understand, easy to compute and it is computationally less taxing, is not it? Because after all we are using the distance formulae. Here mu Ck and mu Cl refers to the mean, mean of what? Mean of a cluster of points where C refers to the class, okay and K and L refer to the band. We discussed this, okay and Dn refers to the digital number, pixel values, okay, minimum distance to mean classifier. Now towards the right hand side, you see a summary of what we discussed as part of previous lecture that is parallel pipette classification. Now this is based on decision rules. Here the difference is that we not only use the mean but we also use the variance of each class or each cluster and the decision rule is defined as highlighted here wherein you see the lower boundary as well as the higher boundary, mu plus sigma and mu minus sigma and this is using a one standard deviation threshold, okay. Now we also discussed one more thing that is a disadvantage of parallel pipette classification method. Now just you know instead of hearing me verbally speak, let me try to show you a diagram, okay. Now this diagram has three axis, the catchment area on the x-axis. I have the other two axis showing the river discharge as well as elevation above mean sea level. Now if you correlate with my similar diagram which you may have seen in the earlier lecture, you can point out the difference that here I have increased the number of data points that is the green circles are more in number here, isn't it? And I have done this just to highlight the disadvantage of the classification algorithm, okay. In particular you can see two pixels that have been specially highlighted in red, isn't it? Because through parallel pipette classifier instead of defining or enclosing the points in a circle or ellipse I am trying to enclose the points that are clustered together using a parallel pipette, okay. But then you know the points in red I am slightly confused because I do not know which class do I assign these pixels to. Say the first cluster of points if I call them 1, say cluster 1 and say the second group of points I call them as cluster 2, okay. Now I can say that all the points that lie inside the parallel pipette belong to that particular cluster which means all these points they lie inside the first parallel pipette we have already named it as cluster 1 and all these points belong to cluster 2. But then I am confused because these red points they overlap in both cluster 1 as well as cluster 2. The parallel pipette themselves are overlapping, okay. Because in the presence of covariance the rectangular decision regions using a parallel pipette classifier they tend to fit the training data very poorly, okay. Which means that now we should look for a classification method that considers the covariance also, okay. A method a classification method that considers covariance as well will be more efficient to classify pixels, isn't it? Now before we proceed to the next classification technique let us try to understand what is covariance. Now covariance between two random variables is a measurement of the nature of association between the two. To understand say I am going to give you a pair of values, one set of values representing x and the other set of values representing y. I want to find out the covariance between these two random variables x and y. Assume that large values of x if it results in large values of y or small values of x result in small values of y then positive x minus mu will result in positive y minus mu and negative x minus mu x shall result in negative y minus mu of y. Let me try to repeat once again x and y are nothing but two random variables. Say I am giving you a column matrix containing x, a column matrix containing y and the number of data points in x and y are kept same. Say you have 100 values in both x as well as y and then I am asking you to find covariance between x and y. Now say the mean of x can be written as mu of x, x of x and the mean of y can be written as mu of y. Now if large values of x often result in large values of y or smaller values of x result in smaller values of y then positive x minus mu of x will often result in positive y minus mu of y and negative x minus mu of x will often result in negative y minus mu of y. Remember this is we can call it as case one. Case one that is both are moving together. By covariance we want to estimate how two variables are moving together. You can visualize using Cartesian coordinates system, the Euclidean space. All right. Now let me give you a reverse scenario. Say case two, say large values of x often result in small values of y. And then let me pose a question to you. If large values of x result in smaller values of y, what will be the sign of x minus mu of x into y minus mu of y? Think about it for one second. Large values of x, if it is resulting in small values of y, the product that is x minus mu of x multiplied by y minus mu of y will be negative, isn't it? Which means the sign of covariance indicates whether the relationship between two dependent random variables is positive or negative. Let me repeat. The sign of covariance, okay, how two variables co-vari, move together. So the sign of covariance indicates whether the relationship between two dependent random variables is positive or negative. And when x and y are statistically independent, it can be shown that the covariance is 0 but then the converse however is not generally true. I won't get much into the details but I hope you understand the concept that if I give you the digital numbers that is the pixel values of two bands of satellite imagery, both of which has been captured over the same geographical extent at the same time by the same satellite, which means they are going to have the same resolution, then I can easily find out co-variance between both the pixel values of both the bands, you know, covariance of two random variables x and y, okay. So given here is the expression to compute the covariance. During your tutorials, you may have covered a tutorial on image statistics using Python. So please try to recollect the explanations. Once again, the co-variance between two sets of random variables, here k and j refers to the bands, okay, can be written as sigma summation of digital number ij minus mu j multiplied by dn ik minus mu k whole divided by n minus 1, okay, to make it unbiased. That is why we are subtracting by 1 in the denominator n minus 1, where n is the total number of samples in the image, alright. So again, why are we learning about all this, you know, because we understood that Parallopiped classifier has a disadvantage, it would not be able to perform well when the set of data points have high co-variance, okay. And then we understood how to compute co-variance between two random variables. Now one important point I have highlighted here is that the sign of co-variance, whether it is positive or negative, it will indicate whether the relationship between two random variables is positive or negative, okay. Sign is important, okay, sign of co-variance. Now just to reiterate, we learnt about co-variance because we need to understand about one supervised classification method which is using mean variants as well as co-variance to classify pixels to clusters to groups, okay. To be more specific, we shall now begin to understand the third supervised classification method known as maximum likelihood classifier, okay. Maximum likelihood classification method. Again, before we actually go into the definition of maximum likelihood classification, let us first try to clear few basics and probability. Consider an example of a discrete random variable, okay, discrete random variable and say the variable X represents the number of times an event occurs, okay. How many times does it rain in Mumbai? The number of times an event occurs. Let variable X represent that and the table that you see, it gives the possible values of their probabilities, okay, possible values of their probabilities. So, I have the value of X, 0, the probability of X being 0 is 1 by 3. The next value of X taken is 1, the probability of X to be 1 is half. Again, X to be 3, the probability is 1 by 6, okay. Let it be any event, the probability that is the number of times an event occurs, that is what the variable X represents and in the table, I have given you the different values that X can take and the probability of X happening. Then, the set of ordered pairs that is X, f of X, set of ordered pairs, it is called as the probability function or probability distribution of the discrete random variable X, okay. Once again, the probability distribution of the discrete random variable X for each possible outcome small x is given like this, okay, alright. Now, for continuous random variable, you know, f of X is usually called as a probability density function and a probability density function is constructed so that the area under its curve is bounded by the x-axis and it is equal to 1 when computed over the range of x for which fx is defined. I am assuming that some background in probability is available with all of us, okay. But just in case, you know, if you have forgotten the basics, this is to help you brush up the basics, okay. For continuous random variable, f of X is usually called as the probability density function and a probability density function can be constructed so that the area under its curve bounded by the x-axis is equal to 1 when computed over the range of x for which f of X is defined, alright. And the function f of X is the probability density function for the continuous random variable X defined over the set of real numbers if these three conditions are satisfied, okay. I will give it some time to sink in, alright. So, coming on to why we are learning all this. So, I will leave you with one more concept that is the Gaussian distribution. Now, the density function of the normal random variable X with mean of mu and variance of sigma square is given by this relationship Gaussian distribution, okay. And now coming on to maximum likelihood classification algorithm. You know, the maximum likelihood decision rule assigns the pixel to the class that has the largest or maximum value. And the maximum likelihood classification of remotely sensed data involves considerable computational effort because as you can see in this relationship, now this method is considered to be accurate, most accurate in fact. And for a pixel, its likelihood of belonging to a particular class is computed using the probability density function of the normal curve which is parametrized using mean and the covariance metrics. Now, instead of listening to me speak, let us try to understand about maximum likelihood classification through a numerical, okay. So, in front of you, you can see the question displayed that is the training data for two classes are given. One is class sand and second is class water, okay. And the information is given in three bands. I am going to call it B1, B2 and B3 to signify band 1, band 2 and band 3, okay. So, the training data for two classes are given here. Towards your right side what is highlighted in yellow are the values for two pixels whose pixel values are given B1, B2, B3 values in three bands are given here and these pixels are having the name data ID 1 and data ID 2, okay. Now the question is if you have the training data information which class do these unknown pixels belong to? That is data ID 1, does it belong to sand, does it belong to water? Similarly, for data ID 2, does it belong to sand or water? That is the first question. Again, the next question reads, can you estimate this using minimum distance to mean classifier and maximum likelihood classifier, okay. So, as before, let us try to first understand what is given in the question, okay. So, given are the training data values in three bands B1, B2, B3 for two classes that is sand and water, okay. And we have already seen as part of an earlier lecture how to compute the minimum distance to means, okay, using the Euclidean space, the distance formula, isn't it? Which means I need to compute the mean of each class. In this case, the mean will have three values, one for each band, isn't it? The mean for class water is also computed and in minimum distance to mean classifier, the distance from the mean to the unknown pixel is computed, which means I need to calculate the distance of data ID 1 from mu of sand. I need to compute the distance of data ID 1 from mu of water and data ID 1 that that particular pixel is going to be assigned to the closest cluster, okay. Which means if the value the distance of the pixel from mu s is least, that pixel is going to be assigned to the class of sand and likewise. As we have already covered the computations for minimum distance to means classification, I won't repeat it, but then I have the values for you to compute and then compare, okay. So, every time I am going to compute the distance between data ID 1 as seen in your screen to the mean of class sand and class water. So, here the distance between data ID 1 and mean of class sand is 9.75 and the distance between data ID 1 and mean of class water comes out to be 70.16. So, the lowest being 9.75, hence data ID 1 is going to be assigned to class sand. Similarly, we can compute the distance between data ID 2 and mean of class sand which comes out to be 16.81 as seen in your screen and the distance between data ID 2 and mean of class water comes out to be 76.77. Again, the lowest is 16.81. So, data ID 2 is going to be assigned to class sand, okay. So, what we will do is we will try to solve now using the next classifier. Let us try to understand about maximum likelihood classification now. We already know the relationship that is the maximum likelihood how to compute it using the mean of each class, the digital numbers, variance covariance matrix and is the number of data samples, okay. We have all the information with us except that we need to compute the variance covariance matrix now. The mean of each class has already been computed as part of an earlier solution, is not it? So, let us try to estimate the variance covariance matrix, okay. So, the variance covariance matrix of 3 bands of a class shall have a size 3 cross 3. The first element is variance of band 1 with band 1, sigma 1 or sigma 11, band 1 with band 2, I can call it as sigma 12, band 1 with band 3, I can call it as sigma 13 and so on, okay. Now, how do we compute variance covariance matrix of class sand? You know, given the details that is training data information is given, mean of class sand is given. So, to compute the first value of the variance covariance matrix, say I want to compute sigma 1 square, okay. So, what do we do? Every time we need to subtract the value with the mean, okay, 132 minus 138.87, okay. So, what I will do is I will do the computations for just one band and then assume that you are able to follow the rest. So, coming on to the third value, it is going to be 137 minus 138.87. Similarly, 139 minus 138.7, 140 minus, okay. So, what am I doing? Every time I am subtracting the value with the mean, okay, because it is sigma 1 square, the first value in the variance covariance matrix, isn't it, okay. And then I can take the summation to get a value. So, I have pre-computed, you can check if the values are correct. And then to calculate the first value that is sigma 11 or sigma 1 square, it is going to be 288.81 by n minus 1. Why minus 1? To make it unbiased, n is the total number of samples. So, you will get a value of 41.25. Remember, this is the variance covariance matrix of class sand, which has a matrix size of 3 cross 3. And what we just did is we computed the first value that is sigma 1 square, which comes out to be 41.25. Now similarly, you can compute the variance covariance matrix for class water also, okay. And then before we compute the likelihood function, we need to estimate the determinant of variance covariance matrix, okay, determinant. And then we need to take the inverse of variance covariance matrix, that is this term. After you have computed both determinant as well as the inverse, we can start with the computation of maximum likelihood function. Completed the calculations and once you completed, you can get the negative log of probabilities for data ID when class is sand, these are the values I was getting. When class is water, these are the values I was getting. The least belongs to class sand, which means data ID 1 is assigned to class of sand. Similarly, I can compute the negative log of probability for data ID 2 when class is sand, data ID 2 when class is water, least belongs to the sand class and therefore, data ID 2 is assigned to the class of sand, all right. So, we have already computed or we already know how to solve a problem using minimum distance to means classification, okay. So, that was not covered here. Right now, we saw how to manually compute and classify a pixel using maximum likelihood classification approach. Now, let me try to confuse you a bit, okay. So, the same question I have expanded now by adding two more classes. So, I am going to call it question 2. So, given here are the training data for four classes namely sand, water, urban and vegetation. So, the same question I have just added two more classes here and the values of pixels are given in three bands that is band 1, band 2 and band 3 respectively which are denoted by B1, B2 and B3. Again, the two pixels which I will call as unknown pixels their values are given against data ID 1 and data ID 2. So, the question here is which class do these unknown pixels belong to that is the first question and this time remember there are four options to choose from. It can be either sand, it can be water, it can be urban or it can be vegetation and the second question is can you estimate using minimum distance to mean classifier and maximum likelihood classifier. Now, just imagine the amount of computations you have to carry out to estimate the probabilities, the negative log probabilities for each and every pixel to belong to each class which means I need to calculate the probabilities of data ID 1 to belong to all the four classes of sand, water, urban and vegetation. Similarly, I need to also compute the maximum likelihood of data ID 2 to belong to class sand, water, urban and vegetation. So, let us see how to solve this. What I will do is I will try playing a video wherein the solutions are made available in an Excel sheet. So, the computations have been done in Excel. You can see that the training data exists for all the four classes, the unknown data for classification is shown here, data ID 1 and 2 and its values in three bands are shown and now coming on to the calculations, as before you can compute the band mean. You will have four times the computations here because we have four classes. Using minimum distance to means classifier, you estimate the distance, distance of data ID 1 to sand and similarly to water, urban vegetation. Similarly, distance of data ID 2 to all the four classes. Again, we can compute the variance covariance matrix which is going to be a 3 cross 3 matrix for all the four classes because there are three bands. I can estimate the determinant of variance covariance matrix and take its inverse because maximum likelihood classification requires variance covariance matrix, its determinant as well as its inverse. So, once I compute the inverse, I can directly go for the calculations of x minus mean. I am following the steps here and ultimately I get to compute the negative log of probabilities. So, you can see pixel ID 1 and 2 for both the pixel IDs I have eight values, the negative log of probabilities, probability of it to belong to four classes. Similarly, for the pixel ID 2, data ID 2 also I have four probabilities and finally I am going to assign the unknown pixel to the class to which negative log of probability is low. Remember, it is not probability, it is negative log of probability that is why I am picking the lowest and also when you are doing computations using an excel sheet, please be mindful of the relationship it takes to compute covariance, whether the division is by n that is total number of values or it is division by n minus 1 to make it unbiased. So, as part of this lecture, we tried to understand about maximum likelihood classification algorithm and we also tried to manually compute the values for a small sample of data having eight values. Remember, typically the thumb rule is that you need to have more than 30 values to make it large sample. But then just for representation purposes, we have tried to manually compute and solve the question using eight set of values. So, let me hope that you understood the concept and followed the calculations. I will see you in the next class. Thank you so much.