 This is the lesson 8 lecture and this has a focus on thematic map accuracy assessment. There are many sources in remotely sensed data and in particular, you have positional error of the imagery that is acquired and it is measured in root mean square error or RMSE and you folks have learned about this idea in previous coursework. In this particular lesson, we will focus on the thematic map error, that is the classified maps that you have developed from the remotely sensed data and the positional accuracy and precision with which the land cover is classified is the focus of this particular lesson. Errors in remotely sensed data involve the concepts of both accuracy and precision and please watch the following short video about the difference between accuracy and precision. Information derived from remotely sensed data are important for environmental models at local, regional and global scales. The remote sensing derived thematic information, that is the classifications that you get from the remotely sensed data may be in the form of thematic maps or statistics derived from area frame sampling techniques from that map. The thematic extraction must be accurate because important decisions are made throughout the world using this information. You need to be very aware that unfortunately the thematic information always contains some error. Scientists who create remote sensing derived thematic information should recognize the sources of error, minimize it as much as possible and inform the user how much confidence he or she should have in the thematic information. Remote sensing derived thematic maps should normally be subjected to a thorough accuracy assessment before being used in scientific investigations and policy decisions. There are many different sources of error in remote sensing derived information and this graphic summarizes a remote sensing workflow and also points out how the error creeps into the process and accumulates through the process all the way to the development of the land cover map. So the error accumulation in the remote sensing process begins right at the time of data acquisition. This has to do with the sensor system, the constraints of the sensor system, the design of the sensor system, then the platform movement or vibrations that might be happening at the time of data acquisition, the ground control points that have been taken to make sure that the imagery gets locked on to the right place on the ground and they're always seen considerations that add to the error in the remotely sensed process in the sense that illumination conditions matter and they can be variable and the topography matters and especially if there's topographical relief, then the look angle also matters and all of these processes add a little bit error to the remotely sensed data that is collected and then we go to the preprocessing stage where there's geometric correction, radiometric correction, there are data conversions in terms of changing coordinate systems and so on and so forth, resampling and all of that introduces a little bit more error and then we come to the information extraction stage where we classify the remotely sensed data into thematic maps. So there's a qualitative analysis where you look at how well your map is by visual inspection, then you do a quantitative analysis as well as we will come to see in this lesson and that there's a data generalization involved as well where you have a class that can be broken up into subclasses and so forth and then again, there's data conversion once you've had your classified map, a raster to vector or vector to raster conversions, then we come to the error assessment of the thematic map that has been produced and this involves many different issues like a sampling design, that how should you sample the points that you need for accuracy assessment from your map, how many samples do you need per class, what is the sample locational accuracy, then spatial autocorrelation techniques can be used in the error assessment process, error matrix or a confusion matrix can be developed and that will be the main focus of this lesson. There are other discrete multivariate statistics techniques that are used for error assessment that are beyond the scope of this course and they're always reporting standards for error assessment as well. So that in the end gives you a final classified map product and this map product has both spatial error and thematic error and that then this map product then goes to work in the public policy domain where it assists in decision-making where the data has been converted to information to enable good decision-making and then the decisions are implemented and this gives you the total cycle of the remote sensing workflow and also highlights how errors creep into the process at each stage and this is something a good remote sensing scientist needs to keep their eye on and to quantify the errors as much as possible such that when the product is given to the decision makers you can also give them an idea of how accurate and reliable your product is. To begin with it is first necessary to clearly state the nature of the thematic accuracy assessment problem at hand including what the accuracy assessment is expected to accomplish, what are the classes of interest may they be discrete or continuous and the sampling design and the sampling frame that you're going to use in this particular accuracy assessment where sampling frame once again means the extents from which you are going to extract your reference points or the points that you need for the accuracy assessment. Here are the general steps to assess the accuracy of thematic information derived from remotely sensed data. Please look through this list carefully. We have looked at many of these issues before and some of these ideas you have looked at in previous courses. So we always begin by stating the nature of the thematic accuracy required in the particular project that you are working on and that means that state what the accuracy assessment is expected to accomplish, identify the classes of interest, then specify the sampling frame within the sampling design, area frame means the geographic region of interest and list frame consists of the points or the polygons that you will use as sampling units in your accuracy assessment. Then you select a method for the accuracy assessment. It could be qualitative and most often there will be a quantitative element as well such as developing an error matrix. Then you need to have a sufficient number of observations required in the sample for your accuracy assessment and so you need to decide on observations per class and there is a lot of research literature on this topic that is available and it depends on the type of project that you are working on. Then you select the sampling design whether it be whether the reference points may be selected at random or systematically and so forth. Please read through this in chapter 14 in your textbook and hopefully you have seen these ideas before and you will be dealing with these concepts in lesson 8 lab as well. Then you obtain the ground reference data according to a systematic design that you set up that also involves an evaluation protocol and a labeling protocol and then we come to the error matrix creation and analysis and after the error matrix has been created we get the producer's accuracy, the user's accuracy and the overall accuracy and then it comes to a point where we can accept or reject the previously stated hypothesis of what we were trying to accomplish in this particular project, how well were we able to accomplish it and distribute results if the accuracy is acceptable. So the results would be the accuracy assessment report, the digital products, the maps, the analog or hard copy products and an image and map lineage report or metadata. Very important. So this is what an error matrix looks like. Hopefully all of you have seen this in your previous coursework. So you can see that the ground reference test information is all in columns and this is also known as the user's information and the remote sensing classification information is all in rows and is also known as the producer's information. For the ground reference test pixels, the ideal situation is to locate ground reference test pixels or polygons that the classification is based on human visual interpretation or sometimes in object-based image analysis, but this is a field that is very active and these ideas are still being developed and so you have to make sure that you select the ground reference test pixels from the study area and these sites are not used to train the classification algorithm and therefore represent unbiased reference information. Okay, so what that means is that the points that you are going to use to train your classification algorithm must be separate from the points you are going to use to test the efficacy of the classification. It is possible to collect some ground reference test information prior to the classification and what would be preferable is perhaps at the same time as the training data. But the majority of test reference information is often collected after the classification has been performed using a random sample to collect the appropriate number of unbiased observations per category or per class. There are basically five common sampling designs used to collect ground reference test data for assessing the accuracy of a remote sensing derived thematic map and they are random sampling, systematic sampling, stratified random sampling, stratified systematic underlined sampling and cluster sampling and in the next slide, I'm going to show you some graphics such that you can understand these ideas geometrically. So here's a diagram of the different sampling methods in an accuracy assessment procedure for thematic maps. So you can select your reference points just randomly across the map or you can have them regularly spaced as in figure B or you can stratify your reference points randomly per class and that is known as the stratified random sampling or you can do stratified systematic underlined sampling where you just put a grid and you randomly put a point inside that grid or as shown in figure E, you can have cluster sampling if you are focusing in on the accuracy of let's say one particular class only for example. After the ground reference test information has been collected from the randomly located sites, the test information is compared pixel by pixel with the information in the remote sensing derived classification map. What that means is that you look at the reference pixel on the ground and you look at the land cover class there and then you take a look at the same pixel on the map and see if you have the same land cover map represented there as well or not and for a very good map you would expect in large part that there be agreement between every pixel on the map and every pixel on the ground and agreement and disagreement are summarized in the cells of the error matrix and information in the error matrix may be evaluated using simple descriptive statistics like the user's accuracy, producer's accuracy, overall accuracy or multivariate analytical statistical measures like khat or the coefficient of agreement. This slide contains the descriptive statistics that are computed from the error matrix and these are the overall accuracy and then the accuracies of each individual categories, which means each individual class from the point of view of the producer and the consumer and then you also calculate the producer's accuracy that is also known as the error of omission. We also have the error of commission, which is also known as the user's accuracy or reliability and it is the likelihood that a pixel classified on the map actually represents that category on the ground. Please read through these ideas in your textbook. Then we come to the multivariate statistic khat or kappa hat, which is also known as the coefficient of agreement and it is a measure of the agreement or accuracy between the remote sensing derived classification map and the reference data and to get a qualitative feel for the definition of khat, it's really the difference between the observed and the expected values divided by one minus the expected value and it gives you a value somewhere between zero and one and so for example, if your khat was 0.7, that means to say that your kappa coefficient of accuracy for that map product is 70%. Here's an example error matrix of a classification map derived from hyperspectral data of the mixed waste management facility on the Savannah River site. So if you look at this matrix, you have the reference data or the producer's data in columns and you have the map user's data in rows and if you look at the diagonal elements, that's where the map data and the reference data, both of them agree with each other such that if you add up all of the numbers along the diagonal and divided by the total number of sample points, that will give you the overall accuracy of the map, that is the number of correct samples divided by the total number of samples. Similarly, you can calculate the producer's accuracies for each class and the user's accuracies for each class. The producer's accuracy is also known as the error of omission. The user's accuracy is also known as the error of commission and you can see the khat value that has been calculated down here and you've got a kappa value of 92.1 percent, which tells you that this is a very accurate map. Please read through chapter 14 and understand the definitions of all of these different statistics in here, particularly the user's accuracy, the producer's accuracy and the overall accuracy and the producer's and user's accuracies for each individual class. You're going to have to do this for the error analysis of your own maps and you will not have to calculate the kappa hat for this lab activity. If you have any questions or comments on the accuracy assessment of thematic maps, please post them in the lesson 8 general questions and comments discussion forum. Thank you for your attention.