 Hello everyone, welcome to the NPTEL course on Remote Sensing and GIS for Rural Development. This is week 9, lecture 1. This is an important course for understanding or getting an introduction to the potential of remote sensing data for rural development. Initially, we had looked at remote sensing applications for rural development by defining what is rural development and then we had gone through each and every specifics of the remote sensing and GIS aspects. Now, we have jumped into the application part. Let us do a recap of week 8 and then see how it is linked to week 9. In week 8, we had discussed LULC where LU is the land use and LC is a land cover. We define land cover as the cover on the bare skin of the earth. What is on top of the earth, which is the bare like rocks and lithology and top of it, what is there, that is the land cover. Suppose there is nothing else, barren land is also used as land cover in some classification methods. Then we looked at land use, where you have a land cover, how is it being used? For example, one may have a forest and converted to a conservation area. Another person can convert it into a school, monastery, etc. And the last person can convert it to a resort as usual. So, there is one land cover, but the land use is different. And then the last point was we discussed land use, land cover change. How does it change for a particular time period? Let us say decade, 10 years once, 5 years once, etc. This is important because we need to understand what is the land and resources available for rural development. And since it's a large scale and a large time span, you cannot keep collecting data, rather you would be using remote sensing and GIS data. Then we looked at how LULC contributes to water and land management. We looked at what are the data issues specifically for estimating land use and land cover and then especially the change. As I explained, the spatial coverage is big and temporal analysis is there. There are two types of temporal analysis. One is over a period of 10 years and it's not once every year, right? You have to take during the seasons. There's rubby, curry, zyth. So, at least three times a year you take and then 10 years to look at some quantitative data for assessing LULC change. So, data is expensive and there are a lot of issues. And then we looked at a lot of proxy data that can be used for LULC. Some sources were given of which the ISRO's Indian government's database was used and we showed a lot of maps, a lot of attributes and how to use it, how to document the land use and cover and add overlay options. We also looked at on-the-fly remote sensing data for LULC using Google Earth Probe. So, we did a couple of things in one time frame like we looked at the Boone's 2015-2016, we looked at USGS 2015-2016, which is the NASA's and we also looked at Google Earth's 2015-2016, which is a college of data from Sentinel, ESA, which is European Space Agency and NASA. So, so much of data. What we have done is we have compared and then beautifully we saw that how the forest has been converted to rubber plantation in one part of India. And then the other part we saw the Ganges water, a lot of erosion, land being submerged and the land was not treated properly because of extensive agricultural activities. So, now we are continuing week line as I promised, we will jump into the hands-on tutorial this week. In the next class, we will add tutorials so that you can download a particular data set and do a very minimal land use, land cover classification. Again, these are like master's projects 10 years ago. Now, because of the clicker button and quick download of data, you can do it within 30 minutes, right? So, we'll show you how to do it and then with good accuracy and stuff. You jump back and forth between multiple data sets. So, in this course, I'm not using one, only one single data set. We jump back and forth to different data sets because the beauty of addressing these complex issues is you have access to multi-source. And multi-dimensional data, you just don't trust only on observation data. You don't just trust only on satellite data, we merge them together and then somewhere we make a data product. So, in week nine, what we'll be doing is we'll look at the types of LULC classification. We now know that land use, land cover classification is very important. However, we don't know like what are the different types. So, let's go through the types. And depending on your time, your data availability and cost, you can pick and choose which scheme you want to do. I will mostly focus on open-source data, open-source software so that you can do it and everyone can do it just with access of internet and a computer. So, this is what we call as giving everyone equal advantage. Initially, those who had the software only could make these maps. But now through this NPTEL course, we are propagating into the area that everyone can map. Everyone can contribute to remote sensing and GIS analysis, especially for rural development. So, as I said, in this week, we'll have one hands-on land use, land cover tutorial. Please go through it carefully and then do it yourself. One recommendation is please first do what was discussed in class and then you can do extra data and analysis. Always, this is the way you should learn. You should first reproduce what is produced in class because we spend a lot of time. We make sure that the data is correct. We make sure that the steps are correct. So, please reproduce our results. Then you can do some, then now you know how to do it and you will be able to do it by yourself. Issues in water availability for crop irrigation, we will discuss a little bit, especially remote sensing for abhi irrigation. As I said, one of the dominant features in this course is that it is for rural development. And rural development is around mostly agriculture. India is still an agrarian nation. Most of the population depends on agriculture for livelihood. So, if we give them good access to improve the agriculture productivity, lessen the losses, then rural development will occur. So, on that note, we will look at some issues and availability of data and then how remote sensing can help. Remote sensing for crop growth, acreage, help and some indicator like NDVI we will use. We will look at moving NDVI and then we will also look at the new need directions, future research directions that are needed to identify crop type, crop acreage and crop yield. Slowly, we will be getting more into applications because the tools are now introduced to you. It's the basic tools that we will be using. But what kind of applications we can do, what kind of research we can do is of utmost importance. So, let's discuss the first types of land use, land cover classification. Mostly, there are three types, depends on which book or reference you use, but mostly it is three types. In some books that we read just two types, which is unsupervised classification and supervised classification, we will be teaching the three types. So, unsupervised means no classification details are there, which means you have an imagery and now you have to classify. Let's draw an example. You have a four by four, two rows, two rows, two rows and two columns, four pixels. So, suppose you have four pixels, this is the image and now you have to classify each pixel, each grid. So, this is what we are going to look at, how do you classify? So, the basic methodology can be divided into three types. One is unsupervised where you do not know what green is. What yellow is, what blue is, what red is. We don't know. Blue could be ocean, river, lake or even in Mumbai, if you fly during the monsoon, you will see half of Mumbai having blue sheets on top. It's not water. What is it? It is a tarp that prevents water from entering the house. So, most of the houses, even IIT Bombay, we have a lot of these blue sheets on top. So, these have to be not marked as water. It's not water, correct? So, then we have the green color, it could be golf course, it could be an irrigation field, sugarcane or it can also be a park. So, all these are not the same for agricultural development and demand, you need to identify what it is. So, what is the step is, as the diagram says, there is generate ISO clusters, which means ISO clusters is same color. ISO is same. So, depending on the color, you will differentiate it. So, now I've given four colors. Suppose my computer can only understand the four colors, then it will say this is green, blue, yellow, red. But it doesn't know what these are, because I'm not telling it and the computer cannot miraculously find it as green as vegetation or not. So, this is unsupervised classification. No classification detail is available. I do not know what green is. Green could be, okay, let's say green is plant, but what plant is classification? Please remember in the previous class, where I shared that there is multiple greens and one is broadleaf trees. One is pine trees, one is plantation, one is rubber plantation. So, all these are green, okay, but you should tell, like, which green exactly represents this deers forest and rubber plantation. So, then pure clustering approach. So, ISO same colors are clustered by the computer. Once you click the tool for this, okay, what it does is the computer, it will take all the green of a particular wavelength, a particular color. It's not green means just green, there are at least 200, 300 shades of green. So, it will take each shade and then club it or it will tell you the histogram, the frequency of colors. And based on your telling the computer how many classes you want, it will give you four. So, let's say here I say four, it will give me four classifications. If I say three, what it will do is because every color is a function of the primary colors, it will mix and match the pixel and then give you three, suppose two, black and white, okay. So, pure clustering approach is applied by the computer. It will take all the greens and cluster it into one group, take all the blues, cluster it into one group and then give you the output. Properties labels are superimposed. So, the labels we don't know, once the color comes, then we think about it and say, oh, there cannot be, for example, as I said, the Bombay or Mumbai sky view application, we definitely know that it is not water. Then I'll say, okay, that blue is tap because that is my field experience. A personal experience from the field you can share. Same thing, I work a lot in Tamil Nadu and Maharashtra, the Ganges regions. I know, for example, in a particular area, there cannot be blue suddenly. It cannot be just water suddenly, it could be a farm pond. Same thing, the green color, extra green color that comes cannot be in Chennai, it cannot be agriculture, it is mostly golf or a park. So, moving on, we have the next part is the unsupervised classification is done. Now, we'll go to the next part, which is, let me bring my pointer, unsupervised classification. So, in supervised classification, we give the labeled input data. For example, I say these greens club it together and call it agriculture, especially paddy. So, once I tell that these greens club together, all these greens will be clubbed together and they will be used as green for paddy in the classification. So, what will be the output? The output will be a generation of clusters with some labels, which are the same as I have given. So, suppose I have given five labels, only those five labels will come, not more than that. Sometimes, it will be less because suppose that color is not available, it won't come. For example, you have water river flowing, I give blue for the classification and classifying, classifying, suddenly the non monsoon summer peak summertime, there is no water in the river. So, it will be brown. So, there will be no water in that class. So, you can always be below the number of classifications, it's not above. So, here as I said, we start with create a training set, which is I will say green is equal to agriculture, blue is equal to water, red is equal to damage crop. So, I create that bank or training set, then I give it to the computer or the program GIS program QGIS and then develop the signature file. So, this is called the signature file or a file that has what color represents what. See, in the first aspect, I'm surprised that is not there because we don't know. We don't know the labels, we don't know the colorings, so we did not give it. But in the second step, we will give the color plus the name as a code and give it to the computer, which is developing signature file. Now, using that file, the computer will run the classification. Basically, all the greens, it will cluster and then go to this file, the signature file and say what is the green? It is agriculture, it will label it agriculture, very, very accurately. So, this is uses label input data, generates cluster with same labels. One example is you are able to give the coloring by yourself to the computer or you can go to the image and then say I know that this is barren urban land in the barren class under the barren classification. So, what you could say is you can take this color. So, what you are doing? You are saying I'm creating a training sample, I am taking this color. So, within that box, whatever color comes, all the colors with that small box, the average color and the wavelengths will be used as urban and for that urban give red color. So, now the paint is going to be applied, all those black and white you could see which is roads or a urban cement coverage, all that will be classified as urban and then coloring will be given and 278. So, number of pixels having that color is 278, everything will be color coded. So, this is suprides. Now, the semi suprides comes where there is a mix of labels and unlabeled classes. For example, you don't know all the pixels color in your image, you know sure that okay, I know in Chennai, this is cement road, this is barren land, this is a golf field, but I don't know some green colors because Chennai, Bangalore, Mumbai still have a lot of trees. So, the tree could be a green or it could be a paint that is applied on top of a rooftop. So, what will happen is you will apply some labels and some labels you will not touch, let the computer run and give it to you as a cluster unlabeled cluster and you will label it. So, this is mix of labels and unlabeled classes is semi suprides. So, now the question comes is which one is more accurate? Yes, the more accurate one will be the suprides version. However, it is also time consuming because you need to have high level of confidence in your data, collect the data for all the labels and give the computer. There is also cost involved, you have to go collect. So, the collecting cost, the training cost, everything is there. So, all this has to be input. So, always the most expensive is a high accurate in this case. Then the semi suprides is good also because at least half of the labels is correct. The others are not, but once the clusters happen, then you will apply it or go to the field and then correct it and say, okay, this is not plantation, it is forest or suppose it is a forest, it is a rubber. So, there is a difference between all these three types of classification, which one is accurate is more suprides classification. But it also means that all the classes have to be labeled and your computer may take longer time for running the analysis. However, in rural development scenarios and real life applications, you do not have to be very, very accurate because accurate data takes time and cost. So, some errors are okay. So, that is why you see a lot of papers, scientific articles coming with semi suprides classification and unsupervised classification. You could say that semi suprides is part of suprides where some labels are missing. So, it is data gaps. So, that is why in some books, it is only two types unsupervised and suprised. In the suprides, there is semi suprides where not all labels are known, only some labels are known. So, let us look at a schematic of the land use, land cover types. So, types of LULC classification, we have unsupervised, suprised and semi suprised in hyperspectral data, which is multiple spectral, very, very fine wavelengths. So, for example, a green will be divided into 200 greens, that is hyperspectral. And you have unsupervised and then you run the unsupervised tool, it comes clusters. So, that is one classification. In the supervised, you give abundant references, references are labels, you let the computer run and give you the clusters, then you apply the labels or you apply the labels and then it runs it, then you have a supervised classification with labels that is called supervised. In the first part, you do not know the labels. So, you will have to figure out what the label is after the clustering. Here, the clustering comes with labels. In the semi suprised, few references are there. So, half of the labels are being populated, whereas half are not populated, there will be some errors. So, this may be full of errors, this may be least of errors, this is the least least of errors because it is all with labels, it is more expensive. I would say unsupervised is not errors, I would say it as no labels. So, data gaps. So, you will have to put the data later and then say what class it is. Let us look at some examples. In this study, we see that Ismail has taken some, so the grid is there, the image is there, the image has been gridded each pixel and each pixel you do not take ground clothing. So, this is the references. As I said, you have to go to the field, collect spectral signatures. So, they will go to that single grid in the center they will stand and then in their phone or an app, they will say what is present in that box. So, in that grid, they know the size 30 by 30 meters, for example. So, they will go to the 30 by 30 meter grid after they print this. First step is they print this image, grid it and then go to the same location using the map, stand in the center of the grid and then like my pointer is in the center and then take a note of what color and what is there. For example, that is green color, I will say green is some type of vegetation which is vegetables. Let us say that is vegetables and here is a forest. So, they are growing vegetables on the side because the color is not the same. You have the dark green and then you have light green there, some type of cropping. Then you take number 20, it is on the border. Number 28 is more urban. You could see that it is urban land. So, this is how you randomize the grids. It is not a particular pattern. You cannot say that one after two, another one and then after two, another one. No. It is random based on the spectral signatures. Based on your data, you have to say how many samples you need to collect. Suppose this is only black and white and you just have two colors, you just go to the most black and then most white, take two samples and then somewhere in the gray three third sample you are done. But here there is multiple colors and within the colors there are different shades. So, you will have to pick each shade, pick each color. So, just look at how many samples there has been taken. The best is taking each grid a sample, but then it is too expensive, time consuming, it is not needed. That is why you have a computer to do the classification. So, you would take representative samples. For example, this grid will be giving the colors of all the sides. This 8, 9 and 14 can give colors of all this forest. You can take duplicates also. So, 8 and 9 would be the same, but you need multiple samples. You could clearly see in this forest, they have taken a center value, they have taken a middle value and then the boundary values to document the forest. So, using this data, they have, this study has done a supervised classification which you see on the left and also ran an unsupervised classification which you see on the right. They have been very particular about the number of classes, only five classes have been given of which vegetation. So, vegetation could include forest, crops, vegetables, anything. All green plants are vegetation. Even golf comes somewhere under as vegetation because you put in the golf field. Then you have urban buildings, grassland, water bodies and then barren. So, the grassland can include natural grass or artificial turf or golf. So, you have all these different classes and what has happened here is you could see that your unsupervised classification is almost perfect. It has classified all the colors together. It did not have the labels. So, this label would not come in this output, but looking at this and going to the field quickly, you can put the labels. So, for example, now you know all the green. You just go to one green. So, instead of going to all these 1720 blocks, you can just go to one of the block because all are green, same green. Just go to one of the block, write it as vegetation, go to one of the pink and then write it as one pink is representative of all the pinks. You say that is urban, you can take another sample if you want, grassland, water bodies and barren. So, this is the beauty of using supervised and unsupervised classification. There is, so this is much, much more perfect. You could see that more grasses are here, some more green is there or the blue is bigger, etc. But this can also be helpful. Overall, the errors are very, very small when you convert this into soil management, water management, agricultural productivity, all these things. So, where does this data come from? You go to the field and I have also taught you a trick that you could use in your classes, which is your Google Earth. So, if you go to the same image satellite, go in, zoom in and just look at it over a time frame, you will know that is that green and agricultural green. See, this image was a one-time snapshot. One-time snapshot is difficult to understand what it is. But if you go to the Google Earth, the same location, go beyond a particular time, let us say every month you go, then you know that the crop is being put and it is growing the crop. Oh, then you know it is an agricultural field. Same way, if it does not change, it is a forest. And if it is well maintained, it is a golf. So, all these things you can kind of assume by going to Google Earth and extracting the labels. So, in the advanced level, we have multiple, multiple classifications. Let me go through some of them as per this paper. You have different types of techniques in LULC classification. There are manual classifications where you go to the field and draw it on a map like the Britishers did, that is one type. There is a numerical and digital technologies, which is GIS, hybrid technologies, which is both manual plus digital. And then there is classification methods based on indicators. This is also GIS. All the right-hand side block is GIS. We will leave it out for now. We will only look at the advanced levels. So, in the advanced techniques, you have artificial intelligence in LULC. So, those people who like to have big data AI, ML in their profile, GIS can give you that because all the data that comes into from remote sensing is really, really at a high, high volume and frequency. So, you get really good data at a bigger frequency. So, then you have hard classification and soft classification. Soft is, it is empirical, it runs through models, whereas hard classification is traditional machine learning, unsupervised, semi-supervised and supervised, which we discussed in the previous slide. And then the recent machine learning trends and the basic KDD or KAK, knowledge discovery approaches. You have knowledge-based classification, rule-based classification, data-driven products, reinforcement learning approaches and ensemble of methods. Ensemble is combining all the methods. In the recent machine learning, there are multiple, multiple developments, where hard classification is done by giving some input. It is kind of a semi-supervised input and the computer is smart enough, it learns from the data, it trains the data and then projects it. So, as a case study, I will discuss one paper from our group, which got published in this journal, Journal of Agrometrology. It is an Indian journal mostly through the association of agri-metrologist. Almost all agriculture universities are subscribing to this because it is a government journal, association's journal. And you could see that it talks a lot on the application. And one key application is the GIS application. And the paper we will be discussing just quickly on the advancement of data is the village-level identification of sugarcane and sangly Maharashtra using open source data. Why sugarcane in Maharashtra? Sugarcane belts are there in Maharashtra. A lot of land is under sugarcane cultivation, especially in sangly. And a lot of farmer depths, suicides are linked to these kind of crop management issues in this area. And so we would like to give some input. So, if you know the yield, if you know the acreage, you can manage the water, you can manage the soil resources and reduce the burden on the farmers by reducing the burden on the farmers, you reduce suicide. Farmer suicides happen because their crop is lost, they put a lot of money in the crop and the yield is not coming. So now if we tell them that the crop is not growing correctly, the acreage is not, the yield is not going to come, they will stop investing. They can at least save a lot of money and they can definitely work the next year and then come out. But if they don't and then they skip on pouring money in terms of fertilizers, water, labor and not get the money back, then they are pushed to the brink of debt. So that is what was one of the biggest reasons farmer depths for committing suicide. So in this study what has happened is we had a need statement. We wanted to map the farmers data using open source software, open source data so that everyone can run it. And there are a lot of advancements in classification. So we wanted to take AI and ML methods and then see which one is best. So we had census data for accuracy assessment, census means census from the ground and also the field data where we did field visits and collected data. And then we did two types of satellite data, one is Landsat 8 and Sentinel. Why? Because both of them have higher spatial and temporal resolutions compared to the Indian satellites. So we have used them and then only the basic bands were used. Some indicators were made, the classifications were done based on three semi-supervised, supervised classification which is classification and regression tree, support vector machine and random forest. So in this, this is not a traditional method. As I said, these are AI and ML based models. So it doesn't come under the semi-supervised but you have to give labels because you give a lot of data to the computer. You give a lot of labels to the computer. It will make the signature files and then classify. So at the end of the day, the sugar cane pixel was classified. And what we found out that there are multiple models that we use, CART, RF and SVM. So as I said, random forest as support vector machine. But we use also different datasets. So we use two datasets for each algorithm, CART, RF and SVM. And we could see that everything is different. So CART is not the same for Landsat data and Sentinel. Same Landsat 8 for CART and RF is not the same. You could see both are horizontally they are different, vertically also they are different. And the green is sugar cane that we are trying to look at. Grape is also there, some regions in sangly. Sangly is mostly sugar cane belt, lot of sugar cane factories. So what happened is when we looked into the results, we saw that accuracy. So this is where I said we went and collected. So it's mostly like a surprise classification. When we did the accuracy assessment, the type of model used also plays a role and the data that fed into the model also plays a role. So you could see the overall accuracy 78% was high using Sentinel to support vector machine. But support vector machine as an algorithm with Landsat 8 was not good. So 78% was the overall highest accuracy. And the producer's accuracy is there, user accuracy is there. So this model, this paper that I've shared is open source. Please read it and then see how this model has been created. It is an open source system in QGIS. Anyone can run it. It's an advanced level. But I want to introduce the advanced level now so that you could learn some of these techniques and then apply it for your research. So here we can also see that the spectral signatures and the enhanced vegetation index, which is based on the bands, is not the same spatially and temporally. So temporally it differs because the growth period of sugar cane is different. The index is at higher value in November, October November period. And then it is the same. So if you take a sample here, it is almost the same. You cannot differentiate between sugar cane, wheat, grapes and banana. But here you can clearly see that the wheat is differentiated. Then the grapes are differentiated. The sugar cane is much, much less. And in this, so now if I apply a filter in this two months or saying that only 0.2 I want or only zero, only EVIs, zero level I want, all the other crops will be gone because this is 0.8. This is 0.6. This is negative 0.8 banana. So only the green color will remain. Now that is how classification happens. So all the other pixels are gone. First the green is given as a value. This is what AI and ML does. So you could see here beautifully that the best model was, as per the accuracies we had, we had both SVM, the support vector machine model highly accurate. And we had going back to the table is that we had better accuracies for Sentinel-2 and Landsat. So you could see that Sentinel-2, this one was more accurate. And in this box, you could see that some browns are creeping in barren, but actually they are sugar cane. So this is very, very important. So a barren land does not require water, fertilizer, subsidies. So the government will think that why should I give subsidy to farmers. However, the farmers are giving sugar cane in this parcel. So if you do a small parcel, it is very small, but then if you accumulate all these data, it is a big data set. So this is where advanced remote sensing techniques can help in accurately monitoring and mapping the data for land use land classification. The other aspects of land use land classification is also forest cover, biodiversity mapping, and then species mapping, etc. All these are also done by a government of India's biodiversity information system website. I've given the website. It is also having tertiary links to rural development because forest and the areas around the forest are mostly rural. And rural communities have a very, very important engagement with the forest in terms of livelihood options, water resources, soil nutrients, medicine, herbs, everything they depend on forest tribals, I'm saying. So this is also a part of the land use land cover. And those information can be taken from here. For this, I would like to conclude today's lecture. I will see you in the next lecture with a hands-on tutorial. Thank you.