 Okay, everyone, I'm Misha. Hi. The last presentation of the day. You guys made it. So proud of everyone. So I'm going to talk about water data and the data project for Nia Water Portal. So water data is fragmented and we don't know what all exists basically. So it hasn't been evaluated and we don't really know what the whole picture looks like. It basically puts us in a position where we don't know what kind of water information is known and what are things like where are the gaps that maybe civil society or other people can fill. Because government hasn't been very good at putting all their data in one place so we can all find it. It's a common problem across all sectors. It just happens to be a fairly big problem in the water sector as well. And if we don't know what's there, we can't really ask them to improve the way they collect data and we can't really figure out what's missing. So we're under the impression pretty much all the time that they don't have anything and that we have to start from scratch every time. So I'll get to that a little bit later. So the problem is large and really unmanageable. Data is this huge problem, especially government data, especially something like water or any kind of science or like, you know, utility information that people need across all spectrums. It's just unmanageable to figure out how do we get this essential resource to everyone? Make sure it's high quality water. Do we do 24-7, these types of things? Where do we get the water from? What's wrong with it? If you get it from the ground? If you collect rainwater, where are the issues and how do you set up a system so that people are covered pretty much all the time? So the solution for us, this is the Inyo Water Portal stand. You kind of have to break it up into small chunks and this is how we hopefully will tackle some of the problems. So you have to kind of ask the question, how much water was needed to meet the demand of all water use? We actually don't know how much water we use. No one knows. It's not like the US knows or Europe knows, but they have a much better idea since most of their domestic use and industrial use is metered and monitored by the EPA in various other places. Here there are no meters and people don't really track their their water use domestically and the industry is not really required to say how much they're using in an honest manner. If anything is known, agriculture is the best known in terms of how much they use to irrigate, but there is a huge gap in how much water we're using. And how much of water is available for us to use is also something no one really knows for sure. There are different issues depending on where your water source is and no one knows exactly how much water is in the ground, how much we're using obviously. So if you don't know how much we're using, how much is in the ground, you don't know how much is left. You don't know how much water gets recharged in the ground after rain or surface water or anything. These are things that kind of we take for granted a little bit, but in terms of data, in terms of hard facts about what's being used, there isn't. It's an unknown. Government schemes and what they are, how they're implementing. Some of this is known, but it's done by a project basis, not an overall sort of funding implementation model where you kind of get a good idea of what is the impact of a government scheme in terms of providing water. So this information isn't really well known and the states are required to collect this information and distribute it. And it's unclear how much they do in terms of accurate reporting of this data. So IWP in your water portal is a content warehouse for water information and they started a data project. So the data project aims to understand what water data exists and create a community around using that water data to enhance projects, build better advocacy and basically just get a better idea of what's going on with what we're using in terms of water use and what we need. So the data project, just real quick, it's a pilot phase right now. We're doing a lot of research. We found 200 data sets online. So there is water data that exists online. We just didn't know it was there and it's trapped in PDFs and not very easy to find or to really use in any kind of way. One of the first projects, it would take 100 years of rain climate change data and put it online. We created this little tool. You can put in what state, what district, what type of data you want, from what time frame within those 100 years and you can generate a graph. It's pretty nifty. See what's happening. You can also download the data. So people have used it to do rainwater harvesting projects and various other things and people have also used it like they used to create this little thing where it's called Isle of Bangalore and you can slide it up and down. It tells you the temperature and how Bangalore has the best temperature and it does show you this in the morning. And then we created the data finder. So after we found these 200 sources of data, we have metadata to it. Now you can actually search and find the source. You can download some data but not all of it and this is the way we hope to tackle the finding the data part. It's not so much open yet but at least we all know where most of it is now. Which is a huge step forward. So in terms of... Sorry, you had a question? Just real quick, the data project is five tracks. We're going after data research which is discovering the data sources. We're going after some of those data sources and putting them into Excel formats. So converting PDFs into Excel, scraping websites, these types of things. We created a platform and we're also going to share data across multiple platforms. Community building events like this and other events to get people to understand that this data exists and that people should start doing things with them. And analysis, visualization, storytelling. That's the last component of what our project is doing. So the data that we have, we don't know if it's good. This is the big question with government data. Primarily government data is not good data. But it's a theory. We don't actually know if that's true. We have no data to support or deny that statement. So as we kind of go through this data, find out what's going on. Things like what's the methodology behind collection? What is this data telling us? What kind of stories you can get out of it? These are the things that largely government has controlled through reports and putting their data in PDFs and on offline formats. But when you release the data in an open format, now you're allowing other people to tell different stories. And that, like we said this morning, you will put layers upon it and now we can find out if this data actually is useful to anyone. Where government is now on an auto vote and just collecting data because they've collected data and they're just going to continue to collect the same data over and over again, even though it's not primarily good. This is something that was developed in the U.S. called the transparency cycle. And it's one of the things that we hope to go through with water data but also all data. There's multiple groups of people that should be accessed when we're dealing with data. It should be something that's not just for programmers or for researchers, but people like developers, engaged citizens, journalists as well should be using data in some way. And this is kind of the route it goes towards through from getting data from government to using it to tell stories and using it to engage government again and to hopefully change policy. And that's kind of what the project hopes to go through this cycle to see if it actually works basically in India. And see if you make data, if you do some data analysis and you ask questions, will government be responsive to it? And we don't know in which of that. So hopefully as we go through the process we can find that out for sure. Okay, India water data. So the real problem with data and water data in particular is that government has large sets of data across the country but it's not very granular. And NGOs will collect very granular data but it's not very large in terms of scale. So for instance government will collect statewide district level data and average it and that's what they will release. Well an NGO will collect for a couple of districts or for a state. It'll be very granular and you can't compare the two. So ground-true thing of government data can't really happen in an effective way. You can't tell if the data government is using is good because they give you such a wide view of it that when you're trying to use data that you've collected the methodologies are so different and the view is so wide you can't actually compare the two and find out if what your data says contradicts or agrees with government data. So Argyan did a survey in 2009 the state of Karnataka. They covered 20 districts 17,000 households and 17,000 grandmothers. It was a rural study so urban data is not represented here. This is a very crude graph comparing our data with what we collected with two government sources, the central ground water board and drinking water and sanitation, the ministry of mercury water and sanitation. So this is samples. Percentage of samples contaminated with fluoride. Fluoride is a problem in this area. High levels of fluoride cause fluorosis which affects your bone and teeth and causes crippling I guess. That was an incident. So it's a huge problem in this area and the northeast arsenate is a huge problem. So throughout the country ground water is very contaminated and if you drink a lot of it these types of things happen, sickness and disease happen. So the government keeps track of how much contamination is in fluoride. So you look at these three lines. The blue line is CDWB. They collect samples from all over the country from a set number of wells. Argyan sample which is the NGO collection and drinking water and sanitation which is just a hub ministry that gets sample data from the states. So they don't actually go out and collect. They just get state level data. You could see that these are very three kind of different lines. There's some connection but if you live let's say here you don't actually know if your water is good or not and it's hard to tell from this in particular which set of which set of data you should actually believe. And this is kind of the problem is that something as straightforward as how much fluoride is in water which is a test that you can do and get a fairly accurate example accurate detection you can actually tell from the data provided whether your area is affected and you should be you should take precautions or not. And we can't really as our NGO data is not very compatible with the government data because our methodology was different and we don't know what methodology they use. What is their sample size? Is it representative of giving you a clear picture of what exactly is going on in the ground? How many villages are coming? How many households? Without doing that without knowing that level of detail it's hard to know if they're actually collecting data that is representative of what's happening on the ground. And considering people are getting sick it's a it's a problem that the government should be dealing with pretty head-on and that's where NGOs and people on the ground want to keep their government accountable to something and water quality is a huge issue just because it's a safety issue people are getting sick. So where do you go to find the accurate information you need? The answer is you go to these places but they're not really I mean the blue line and the red blue line and the green line are both government sources and they both say very different. This lack of consistency and the lack of ground-truthing ability is a primary primary problem and with something like water which affects us very deeply because we need it every day it makes it a little bit it makes data more important and the accuracy of data and how you collected more. So basically to conclude is that there's a lot of gaps and we hope to there is a lot of work being done to hopefully fill those gaps. The Planning Commission did ask for a group of people to write a report on water data and the management of it and that report I have copies of it and it's available online on the Planning Commission site. It did state that stated some of these problems and recommended a lot of solutions but it's yet to see how committed government is to doing some of this data and water is a state issue so it's really up to the states to kind of decide how they want to test their water and keep track of it and provide access to it and keep track of what they're doing and how well they're doing it. So in kind of to sum up the data project hopes to like deal with a few of these problems we can't do at all. We're going to hopefully create some sort of framework for water data saying what what water data should what water data should exist what does exist currently and where are the gaps and hopefully through looking at the water the data that exists now creating visualizations creating storytelling we can push people to do a little bit more robust data collection and find out some of the answers to the questions that people have like is my water safe to drink because right now the data that's available doesn't actually tell you in a succinct way. So that is my opinion. Questions, comments? Sorry? So Sam who is not in this room is trying to create a sensor that you can hook up to your smartphone that would test water quality but the kits are like they're not too expensive right you're supposed to get a kit from your government if you want to test your water in an urban setting it's not too bad but the rural setting tends to be a little bit more mixed. Government does collect it pretty consistently over time. You can get so on to Central Ground Water Board you get up to two thousand that you're not going to find anything past 2009 in terms of waterfall or whatever the you might find on the Missouri Drinking Water 2010 or 2011 but in terms of robust big data sets you're not going to find anything after 2009. So there's a there's a lapse here. Missouri can actually correct me if that's not true. Government do collect data continuously but they publish it once and for is like CTWD has published 2004 one report and in 2007 and 2008 they have published one report on groundwater quality data. So when we depend on the government when they publish it so still the report they have monitoring cell bills where they are collecting data from and they're monitoring it regularly on the regular basis but it's not published and it's not in public. And it's not consistent. I work in the area of waste management and to some extent I want to do something like this for waste and resize thing and stuff so actually I'm very interested in this session because the only thing that I'm wondering is that I will be sourcing the data from government agencies as well as NGOs so I currently am attached to an NGO that services waste related services for people who are not serviced by BBMP. So how do you solve the copyright issues of data? I mean for example we service Google for example will Google share the data so openly about its waste management recycling? Do you have contracts? Do you enter into contracts? So copyright is a question for Pinesh but we put up the met data the the rainfall data that we got not from government but from another source who got it from government and put it up and they haven't said a word to us to take it down. Copyright is something they don't go after unless you are going after something that would kind of take money away from them. It's it's still kind of up in the air. There hasn't been a fight about it basically it's still a gray area so we operate under the assumption that if you add value to the data then it's okay to republish. Just add on to that because I happen to know of it. They may not use copyright against you if it's not a monetary issue but if they have a concern about the data that's being used if they did some digging they would be supported by copyright law perhaps and asking you to take it down. So for example you know many times agencies may not like what you're doing with their data and they say stop doing it and say take up a mission if they don't have a actual right under any other law to say that you need to take up a mission then say copyright under copyright this is our data it may not always be that data but sometimes it might be. I mean it's a hundred percent it's a real law they will go after you but they just add on to the copyrights of it which I have personally dealt with quite a lot of countries in quite a lot of Indian copyright laws are very very vague okay that is one thing that I would I like to add on from the legal perspective like even if it goes to court it doesn't stand because quite a lot of my work which were stolen over the period of last year research data which was stolen I couldn't get it back thanks to Indian copyright laws okay so the only time which where you have to be worried about governmental data or the you know the public data is the only time which you really need to worry about is when you're misusing it and there is a lot of difference between using it and misusing it right. I'm not worried about government data because it's like any data I'm talking about sorry to interrupt you just one bit of announcement if anybody's interested in your idea I meant the food court. Sure. I'm worried about clients like these like RMZ in infinity is one of our clients like you know we do waste management for them so we have data on the recycling rates across two years probably. Who collected it? We collected it. You collected it. Because we are collecting the waste and we are recycling it for them so we have the site yeah but we haven't entered into a contract with RMZ in infinity apart from giving back to them the data that this is what's happening this is what you're deciding. I mean I think that would depend on your contract. Yeah that's what I'm thinking these things can be built in in the contract which says after your work is done like if RMZ has asked you to collect this data for certain project after once your project is done we are free to use this data in any form. There is always a failure policy that all these things happen. You can still add on to the contracts. Yeah. I mean more better than your legal advisor if you have one can help you okay and but other than that other than that generally speaking copyright is not a very huge issue in India at all yet not yet it's never there are two things which never is never an issue one is a privacy and the second thing is the copyright. As a person who's a lawyer for your company is willing to tell you both if you depending on who you are saying on this point what would be interesting is what extrapolation you do on the data part of it if you do very aggregate data extrapolation there's an argument made that you don't have to necessarily build in contractual terms on privacy but for example if you're doing waste approximation which allows you to determine what particular individuals or families or companies what their personal habits are that might get you into situations so if you see what some people do when they do this data correlation other countries they say that we perform these services for you one of the rights you give us is to do broad-level data analysis of what you give us that's what some people do they build that into that in their standard contracts that allows them to do basic data metrics right and many companies have that there that's about it I would start adding that I'm doing it isn't that like if Nisha was to I mean who were to a story on this thing and you know put allegations on the government data then they could do that right is that is that the kind of if this graph is a violation of their of their copyright graph actually no I'm saying if you do a story on this comes out in the in the public so they depend on what she's using for example as Nisha mentioned she's publishing data which they directly collecting and it's arguable that it is a copyrightable work then they could try and use it but it depends a lot and more importantly she'd have to sort of public she could sort of public use defense and say it's a what's what's the term fair use we don't have that exactly in India we have a different thing but you could assert it but they can use it and they have occasionally used it in India not so much not very often but in other countries prisons are all very similar to us people have sometimes used it that's a common reason like if you're offending if anything for example copyright sometimes enforced by Nisha that they very rarely actually enforce it except when they see a commercial strong commercial thing but if they are concerned that you're saying something which really offends them one of the things they can assert is copyright because they do have it sometimes not always there's a lack of clarity of what they will go after what they won't and what are the parameters of the local at one level data is not copyrighted because it's not a work of creativity exactly exactly the data itself but that's only for the last few years but tables can be copyrighted sometimes depending on what you're doing so pure data should actually correct but the way you express it unfortunately that's a easy area and you write in some other countries come up in the u.k. they have instructions to government you shall not you shall not sue when somebody that has brought data collection based on government data but there's no such you know order in india i think it's like a matter of fighting out in the courts or somewhere where where clarity has to be established and they haven't gone after anyone enough people to really cause that fight you know i mean that's okay isn't that something like it's also a good thing in a sense if you have a long established practice of carbon not going on to people using it then when they do assert it you can make this argument saying you did not do step into enforce your rights for nearly a decade but you have to convince a judge a judge may not be convinced but there is one more but isn't this part of our conversation right coming from the first event itself like you were there as well now we are seeing the government what it's trying to do on the internet privacy and censorship with these kind of questions where the the term itself is not clear so far uh no i mean i'm saying this copyright these things are of data like it's still unclear like you're saying it's a hazy term like shouldn't we as a group or maybe we should try to do it so this is the areas where besides having this general open data policies besides even broader things they were actually convincing them to do data you just ask them don't sue us make a commitment which you could you could ask the copyright office to issue instructions on this and that's completely doable people have done that in other countries example before the us came up with this open data policy they were three or four years worth of advocacy the people just went and said convince the corporate office the department of justice that don't sue these people even the us's public domain but they said don't even enforce tree no information act things against them you can do all of that these incremental things that you can do exactly why these conversations have been here right for you to outline what you and if you ask government for data they usually will be like they will say to you don't this is not public you know because they want that power to be like I know you you're going to be nice to me and do a nice thing not use data to be mean to me and here you go and like and that's kind of the they have to figure out if they can trust you with the data and they'll say don't don't put it up they'll give it to you or they'll sell it to you which is even worse they sell it to you and you can't put it up that's kind of their like revenue generation so there is they what how do we incentivize them to share and then allow us to share and that incentivization process just doesn't exist so they'll go after copyright whenever they see something one thing to also remember is they very often not themselves clear about what their actual rights are or permission for example very often department make it information but they always believe that they need to get permission from somebody has to share they're not actually clear themselves this is it goes both ways like risk aversion risk aversion and it's actually true risk aversion because for some of them they're worried about how they can enforce it for other people they are afraid that they will be subject to sanction later so that's why when you advocate externally you're saying that make this clear for everyone it's also clear for government which is exactly what I think you may have heard other people learn on the data meet group talk about the nsdap the national sharing and data feasibility policy and documents like that it's good for government also because government can't share data amongst each other the government has great data which they can't share across departments because they're wondering half the time who has to get permission to send it across so it's value for them also to know these things that a copyright sometimes doesn't apply or that you can't share it avama answered it simply by doing one thing by saying the general position is that you can't share it and he should it as an executive direction under his office for gender cause India of the cabinet secretary did the same thing should have the same method it can be done so I want to just add because he says something very nice which is great data did you say great data but if you didn't then I am imagining it because I'm thinking in this waste management there's a lot of grey market like there's a lot of illegal recycling market that doesn't come under the legality of the government if I if I collect the demographics of the recycling workers which are in it's a completely informal industry in jolly moor law wherever in Bangalore then it's completely things that are not mapped yet or something you know the government is not very excited to know that there are like not 15,000 waste because in the city uh stuff like that so this data is actually very nice to have been put up somewhere to be accessed somewhere it's not validated by any legal standing everybody under down plays it companies like this overplay like you know if your recycling rate is x they want it to be x plus something you know things like that so in in our sector in this sector the data is always you have to be very astute in coming out with the analysis in that so which is why I have issues about you know ethical issues about data sharing in this thing which I don't like you're faced in us a little bit we have we talked about it brief I mean we should have done a session on ethics but hopefully we can talk about it on the google group if everyone joins um and kind of because it's it's a hard question to answer and I wish Purnesh was in this session with me because he can he is really good at answering these kinds of things it's there are he said what he says there are also always harms that could come out of data and open it in particular there there are harms that can happen and that you have to think about that um so if you have a list of how many people are scavenging and doing the waste workers you know that's not I mean that could be a controversial list or that could cause something you know but it is a good thing to know right like and where is that balance and what do you what do you do so with water data like the data on the department of drinking water and sanitation which they have a lot of data and they update it fairly frequently has been pretty informally said that the data is not good because comparing it to baseline data done by NGOs in certain areas the data is blatantly false so for instance in the northeast the data in government sites will say the arsenic level is much lower than it actually is um and it's these types of problems that can kind of cause you know well what is what what exactly is the level so for for that kind of situation you go well of course you would share the data and then your your baseline data find out what exactly is the arsenic level but there's a whole political reason why that arsenic level on government site is lower than it might be in an actual you know it might not be a methodology issue there's politics behind it there's power behind it um so you can never know the there are unforeseen consequences in in sharing but the if you start with sharing is good and then kind of work back from there then i think that's okay uh what is a negative consequence you don't want to happen like do you privatize or um how do you protect for privacy um i don't really have the answers to these i'm just kind of talking on the top of my head right now but it's it's a it's an interesting question that has to come out um at some point and i think right now it's kind of a decision that people are making on their own and not really making as a community or as a united voice about something no one's really talking about it in a formal way really 10 minutes so any more we can show something else a j if you want to go through you this is just a map we are friends of mine from stanford they're indian researchers working on water sanitation issues so they want the help in setting up a map to crowdsource this data i'm a tech guy who just uses uxiety a lot i help them set this up and there is no data because it was just launched the day before yesterday or yesterday so you uh if you show the the header yeah so reports about water sanitation issues so uxiety is a good crowdsourcing tool which lets you aggregate data through the website sms which and there are apps for it we are set up sms also you see on the right side there are categories where you can report issues with what tap palm streams and river well sewage and defecation problems or garbage so basically water and sanitation issues so yeah so this data is i mean this is an effort to crowdsource this data through citizens so so let's see how it goes have we sub-categorized these categories like in wells what type of data and see they work on they're still working on it like if you have any suggestions so we have to improve that because i'm just handling the tech part and this thing is planned by them so you can always write to like suggest something because it's still in the process so it would be nice to have sub-categorized like you said like if you talk about wells you want to measure the water quality of the well the water level of the well the location of the well and the type of well it is a tube well a duct well or a hand so all those things yeah just take it up another just communicate okay there are no more questions this has been a long day i'm going to wrap it up thank you all for coming