 Hello everyone, welcome to rural water resource management NPTEL course week 8 lecture 5. We have been looking at the issues for rural water resource management ranging from the application of funds ownership and also data issues. In the last lecture, we looked at why data is not enough and even what are the issues in the current data available. We will continue that before we look at the recap of this week. Sources of data quality problems. Data quantity is an issue and a data quality is an issue. When you talk about data quantity, it is either the data is there or not. Not lot of data might be present. For example, instead of weekly it is monitored quarterly and not enough wells. For example, if you have monitoring of groundwater as a problem, you are not doing it at every village but one to two stations per state. So these are the quantity problems. What is the quality problems? There are a lot of problems that arise because of data downloading, entering, etc. Let us look at one of the key errors as per this publication. It is a big error. It is a big issue and that is why there are a lot of studies on this. So look at it. Data entry by employees is pretty big. Almost 76% of all the errors is because of data entry. The human interface that enters and also automatic data which gets converted into one from one form to another form. Data entry by customers is also there. Some others would also enter the data. Changes to source systems. This is your instrument which has been changed. Maybe the instrument was collecting data for 10 years and then after that it slowly gets into errors because it is a machine and you have to replace the machine. Data migration or conversion projects. This is where I said when you convert from one data to the other or if from one instrument through your laptop to your dashboard, there could be some data issues because of format and because of how you can show the data unit conversion or these things. Mixed expectation by users. So some people may want it at a different unit. So that is where it is. External data is when data comes into your system. Your system errors and others. So these are very small compared to the major which is the data entry by employees and then data migration and instrumentation errors. So the quality is pretty big. The issues with quality. And please understand that if quality is not good, then it is as equivalent to have no data, which means data quality issues. So both are not available. There's no point in having bad quality data, which out of which you cannot get anything. So in that report I mentioned in the last class, which was done by World Bank. We found that the data is shared by some countries of all the South Asian countries, some countries readily share the data. But the data was in a such a format that not much could be taken out for climate change impacts. So we have to stop and say, okay, you know, the first recommendation is please collect good data. And this is very, very important, even now, and most important now because of climate change. So let's look at the ways forward enough of the problem statements and looking at sources for issues and errors. So let's look forward. We want to understand how can we rectify these issues. So the data quality is there with data governance wherein data governance is kept as your data sharing and how do you really collect and store data into a particular matrix and then do the analytics. So the quantity part, the governance part is on one side, whereas you have the data quality issues on the other side. And there is an overlap. Okay, how do you manage to get good data out. So in data quality, parsing, matching and profiling are done to get better quality of data. So parsing is conversion from one form to the other and matching data from primary, secondary and profiling. You do have some overlaps, some methods where data quality can be enhanced. In the data governance side, data availability is big. You can have open source data, and then some compliances of data, how do you use the data, who can store the data, those rules and regulations can be kept. But what is more important is the central part to move forward. What do you have is an overlap between the data quality team and the data governance team, which is both require data cleansing. So have good data quality, data cleansing is important and same for data governance, data cleansing is important. Of the methods, enrichment of data, which means if you don't have good data or data gaps are there, you can enrich the data. You can add data to fill the gaps through multiple mathematical models. Monitoring can be increased, increased in monitoring, generalized cleansing. You can clean the data for using statistical and other methods and standard standardization, wherein you create a standard or normal to the data. In the data consistency issues, a lot of steps can be taken consistent analytics. You don't run through some statistical packages just to make the data look good. It has to be consistent what has been done across the world, what is accepted in literature should be done. So this please make sure that you understand when you do rule data management, rule data issues. There are a lot of statistical packages that can make your data look good. But that is not the point. So when you do analytics or data analytics and data work, it is important to have consistent analytics so that you can be readily compared last 10 years, last 5 years. And you can also manage to compare it with other studies. Metrics and reporting are also very important. How do you create new metrics for data indicators and how do you report it and data selection. So this is also important. You cannot take one station for explaining the rainfall in the village and then suddenly after five years or two years take another station. Either you have to combine the stations as one unit or only select one data and then make converted reformatted into another data. So data issues do exist, both in quality and quantity. However, the methods in addressing these data quality and quantity issues are almost the same in multiple settings. And that is what we are coming here is we can have some overlaps between data quality and data quantity management or how to tackle data quality and data quantity. And you can do these exercises in a good fashion. Remember the aim is to better manage the water. And if you say if you want to make the data look beautiful and then clean the data without any objective, then the whole point of data for water management is taking a ride, which means it's not correct. So if there is data issues, it's better to try to salvage or get as much information as possible. Otherwise, you just have to save the data is not good. Let me collect the data to understand the problem. Don't jump into the solutions with bad data. That is bad science. So all of you are going to be young scientists or young researchers learning these data methods. There are a lot of methods now to make the data look better or look good. Don't fall into those traps. The basics have to be strong and how you select a model is very important for data cleansing. For example, I'll give you an example of data enrichment or data of a gap filling. Some people can use an average, some people use 10 years average. Some people use nearest neighbors, imputation methods. So there are multiple methods to fill the data gaps. But what is more important is, is it making sense? Logically, with other data, is it making sense? Or you can just say, if it doesn't make sense and you don't trust the methods, just say data is not there, but we'll use the other years data to understand what is happening. And data is prone to have errors because it's an instrument. Instruments have errors and they will be making some issues due to management and maintenance. Also remember that these instruments are not there to actually monitor these in many cases. For example, the groundwater, again, I'm saying an example. Because groundwater is measuring the pressure and the pressure is converted to a water level. When you talk about some instruments, piezometers, for example. Piezometers is working on the principle that a spring is there and as the water level increases, the force of the spring increases and the compression of the force of the spring is converted to a water level. So see how many changes are there from water level to the force of the spring how the spring compresses is converted to another parameter and the parameter goes back to your thickness of water. So keeping it simple, all these are not directly there to measure your water parameters. However, you could convert it into a parameter through algorithms and equations. So there is a point where you need to be careful about which ones you use for water management, which methods, etc. Clear this drawing. Another thing is convergence of funds and data. So as these works are being done, you could see that there is could be convergence of funds to do the work. But there is no data for monitoring and fund for monitoring. So why is that happening? Again, as I'm saying, the mandate for these agencies, Mandrega on the left and IWFP on the right is to manage the water, not monitor the water, is to create land resources, better land resources for agriculture, not to monitor the agriculture activity on the field. So it is important to understand how you could put a monitoring in this funding network. You need to be creative, talk to the agencies and say that for better, for example, I would put it in a way that to enhance soil moisture, I need to have better monitoring of soil. To enhance multi cropping, I need more data on what crops are necessary, how we do it. Okay, so the other one is micro irrigation. For micro irrigation, you can automate it, which means water can be sent through drip irrigation or specific to the point agricultural precision irrigation. All this requires data. A soil moisture can be converted to turning on a tap, a signal to turn on the tap for your plant. Now, how does the instrument know that the soil moisture is low, water should be applied? It knows because the soil has a moisture sensor. And that is important for understanding the data for soil. So you can indirectly put monitoring cost in these frameworks. That is what I'm saying, drought proofing. How do you know if it is a drought unless you have soil moisture in the thing? You can say, okay, rainfall did not come for two years. But my point is rather than saying it is a drought after two years, you could be able to capture the soils, water content, if you have a meter, and then that data can be converted to management. Moving on, I would give you a very important paradigm shift which is needed for converting data into action and then preserving rural water. So let's take an example of a child in the rural village waiting for water when her mother is pumping. The water is not coming, okay. So what happens normally in a normal situation when such an issue happens, there is a talk only between this network wherein the policy makers understand how there is no water. Like for example, a mission is there to provide water for the public and they find that the water is not coming. They go back to the budgets, get more budget and then put it back into the water issue. Okay, so more pumps, more augmentation of water. However, there is a gap. Let's take a live example. When Chennai ran out of water last year. So water was brought in trains and tankers. Really, still tankers are coming in from outside. But is there a solution to it? How long can this happen? Okay, that's what I'm saying. Suddenly there's an issue. Budget is released to get tanker water. Good, happy. Everyone is happy. You can move on. Then the monsoons comes, people forget it. And then suddenly another drought happens. So it's not a solution, but a short-term fix. Okay, suppose your tap is leaking, you put a tape around it and then go get the plumber and then fix the tap. So until then the tape acts as a short-term solution. It cannot be kept on going like this. It's a waste of money and waste of natural resources. So what I'm proposing here in this class is once that issue happens, yes, some short-term fixes can happen. In the meantime, the agency plus the monitoring system has to evolve to collect data. There are multiple sources that you can collect data. I wouldn't get into the detail. It is beyond the course, but some of them are remote sensing data, satellite data, surveys, app, and then big data for capturing these activities. And then it goes to collecting all these into bins and storing them into data archives and then converting them into maps or indicators and metrics, which we discussed in the previous slide. Right. So what happens next is you would take the data, convert to map, transform and clean the data and then go into a data server. And then it goes to explore, clean, analyze, and share. There could be some levels of cleaning. And then once the data is analyzed, it can be shared. Okay. So once the data is clean analyzed and can be shared, then more research can happen and teaching can happen using the data. And suddenly an extension network can come and say, okay, for example, your MG Nerega can come and say, okay, I'll use your data for better water management. So it goes like this. I'm going to give you the steps. So define the problem issue. I identify the big data methods where you can collect the data. And then there are four Cs which are collect clean collaborate and communicate. You collect the data using your methods. Observation data, primary data, secondary data. Then you start cleaning the data to remove uncertainties, bias, errors, gaps. Okay. You use statistical methods or proven methods to make the data better. And then you collaborate with other data or other agencies to make the data look better. And then you run your analysis and communicate the results. What's the point of running this analysis? You're not giving it for the general public. So there is a lot of importance given to communicating the data. Once the data is communicated and the results are communicated, a lot of research and dissemination can happen for better water resource management. Remember that all this data comes to the platform where you can anyone can use it. And make sense out of it for management or research and dissemination. And then it goes back to the policies and bank where money can be now released on a better term, better long term plans. There's a dynamic and cyclic process. So anywhere there's an issue, the system can come back to a one step backward and then work it way through the system. Suppose I'm going to the AI model and I find that some of the data is not correct. So I go back to the data and then collect it. Okay. So let's take a recap of week eight. This week was very important to look at water resource management after we looked at the hydrological cycle and the key parameters, etc. We looked at the water management issues, where are the issues? How are they coming? What are the key players for the issues, etc. And then we looked at specifics on how to improve water management. Remember in the water management issues, we looked at other structures or their importance given to water and then we learned into water management issues. We looked at how you can improve the water management by having collaborative work between the private and the government sector. We looked at the role of NGOs, how they could help. And most importantly of the issues was the capacity. The local people have to be trained so that they can address these issues in the longer term. In the infrastructure issues, identification of key infrastructures is important. There's a lot of losses of these structures, check them through washed away groundwater wells, defunct all those terms. And that happens when there is an infrastructure issue. And non-representativeness. Take the time to understand these issues, especially the infrastructure. There is something called a one size fits all approach, which means suppose check downs worked in Tamil Nadu. It doesn't mean it has to work in Karnataka, Kerala, which is nearby Tamil Nadu. Or it's no reason it may not work in longer distances from Tamil Nadu. So it is very important to understand that there is no one size fits all. I cannot use one check dam across India. There are multiple methods to construct check dam, what civil work you do for check dam, etc. There are multiple designs for these check dams. So you need to understand all these for infrastructure issues. The most important issue we saw is the maintenance and ownership. It is not right for the locals to be happy when there is a water infrastructure, but not taking care of it. It happens to every single system I've seen. I've seen check dams being not maintained, I've seen wells not maintained, polluted tanks, people misusing the water, etc. So at that time what happens is there is no ownership for maintenance and conservation of the water bodies. And that is where we saw that in those cases, public participatory exercises have shown to be very beneficial for the local setting. I showed you the case study of the NMSathguru Foundation in Dahod, Gujarat, where a location is seen. They have taken a location with a lot of water losses from the watershed and built a check dam. And the check dam was built using community participation and involvement because they have been struck by no water for a long time. What has happened is in beautifully the water has been stored and when the water is stored, all the banks have been having increased soil moisture. Because the water is there, soil can pull the water up to a particular level and you could see that the irrigation which is happening on the top from this water through lift irrigation. What is happening is the entire range is getting a lot of greenery, soil moisture increase because of these check dams, because of these infrastructures. The infrastructures may be small, but impact can be big depending on the geological setting. So to move on, the maintenance and ownership was the key. A lot of stress was given on creating these public participatory networks. We looked at NGOs and discussed what is an NGO, how they play a very vital role in bringing your stakeholders of government agencies and the public to communicate each other with each other and manage the water resource. Again, the NGO's duty is not to stay in one particular location and take care of that village or district or block, but it's mostly to learn these new methods, apply it and then take the learnings to another center and then rework the same thing to see how they could use it. Kind of upscaling. But more importantly, it is not just that there are some NGOs where they be on a particular location because they suddenly start to invest in more branches. For example, just not water, but let's talk about agriculture harvesting. Let's talk about taking the harvest and preserving it and exporting to foreign countries. For example, grapes in Maharashtra are exported to foreign countries and it can be tied up to the water because grapes consume a good amount of water. So all these are necessary and even though you want to set up these water management structures and you want to work with the government, at the end of the day, you need data. There are multiple issues for data that we looked in the class today and we also trained some of the students in using statistical methods or mathematical methods that are apt for taking the data issues down. When I say trained, I meant of you have been given a recommendation of specific methods. Okay, I'm not talking about average mean mode or the AI methods of CNN, CNN, those kind of things, but more I'm talking about that you do have passing, you do have data cleansing methods. Go and look at the literature, which method is more appropriate for your work and then use it. There are many, many methods, that's why I didn't bring all the methods here, but depending on the data you download, for example groundwater, and I need to interpolate into a surface. I would use inverse distance waiting method IDW or creating method based on the thickness of the aquifer and if it is a shallow, unconfined aquifer or a confined aquifer. So issues are there, but hopefully there are multiple ways to move forward. And I hope this continues to go forward for better rural water resource management. There is a need of ownership. There's a need of public participatory networks and NGOs and good government agencies, along with them. And there is more importantly to understand the data issues and create more data for better management. At the end of the day, I would like to conclude by saying you cannot manage what you cannot measure or monitor. Thank you.