 Hello. So, I just had a question of people in the audience if you could raise your hands. How many people here are familiar with or have used OpenStreetMesh? Oh, wait a minute. And how many people here are familiar with or have used any variety of GIS, geographic information systems? Oh, great. Then I can cancel my talk. So, I actually didn't know who I was going to be here, and when I spoke with Dejesh and Nisha about this, I was expecting there to be about 20 people or so, and now there's like 200. So, I was going to speak about OpenStreetMap and about spatial data, sort of geographic information more generally. But then I thought that I might actually talk more about my own experience as a PhD researcher, cover some of the basic GIS and OpenStreetMap kind of fundamentals, and then, you know, in the third part of my talk, if I have time, I'll start running through some stuff on the web which we could meet in the afternoon in a more informal setting when some of the other people who are here would come up with mapping and spatial data and we can take that on. So, if I don't finish by the end, at least the first two parts of the presentation should be okay. Now, does this work? Wow. Okay. I'm a historian. I'm doing my PhD in urban history. Everything I'm going to talk about is really related to the city where I live and study Mumbai. It's broadly applicable elsewhere, but I can't really speak for more than that, and I can't also speak for uses of mapping in, say, ecological or environmental things. I'm talking mostly about the urban kind of arena. And this is where I work in the archive of the BMC, the Bombay Municipal Corporation, and these are the kinds of archives that I deal with. They are very, very difficult to find, to access, to repaginate, and to make sense of. And it's a lot of really hard work that historians kind of get a kick out of doing that. A bit like software programmers in the sense that we spend a lot of our time dealing with code, but in my case, the code is a lot of ancient, or not ancient, but sort of colonial documents, physical cooperation. And it's really interesting what's happening. I mean, there's a lot of people who are always very negative about working with government, and then you can never get anything out of government. You always have to file an RTI. I've come back to India after five years, mostly of being in the U.S. at MIT, where I was teaching and doing the other things before they actually let you do your real PhD research. And during that five years, the RTI Act has been around and been enforced throughout the country. And I've noticed a dramatic difference in dealing with this particular office, in fact, where I used to go seeking information about the history of land, buildings, properties, tenancies, leaseholds in the city. And now, actually, this type of a situation is going away now. In the government, sorry, in the municipal corporation, but within the state of Maharashtra, there's actually been a government order going down for the past three, four years now that every municipal body, municipal corporation or council now has to install ISO-certified record keeping systems classification to stem the tide of RTI requests. RTI is literally pushing a lot of these institutions into a place where they can deal because their own record keeping practices have declined so much in the past 20 to 30 years. There are different explanations for that. A lot of it has to do with language and English. I don't think it has so much to do with corruption as with general overall inefficiency. And I happen to have been lucky to have been working in the municipal corporation at the time when these new record rooms are actually being set up. So I get access to all sorts of wonderful data. And most data that is about cities is also about space. It can be mapped. So I'm going to just run through some of my own experiences of dealing with a few different types of databases, some from the municipal corporation having to do with properties, some from the collector's office, which is a revenue department dealing with land, and also transport data from the BST, the best, which is Bombay's bus company, and a project that I've been running with some of my friends for the past few months called ChaloBest. This is normally the problem that we've had for the past few years, is that where do we first get our data? There's a whole group of us who spend a lot of time writing a lot of custom scripts to scrape data off of government websites. Before you could actually walk into a government office with a pen drive and they would actually give you something, which they do now. We had no other option but to do this. This is an example of get your property card online. You go to this website, which is a really, really badly designed website by the National Informatics Centre. Totally insecure. You type in your cadastral survey number, and it gives, it's not very legible, but it gives you an entire ownership history. It gives you every detail about it going back to the old British surveys of the 1870s and 1930s. Now, what I didn't realize when I first looked at this website is that every one of these pages has a unique URL. So that meant that I could actually sit for one night, write a script, and download all of them and put them into a database that I could then search for myself. Because I was interested in the history of land holding patterns in the city and I had no data to deal with. I mean, I had those big archives, the big sort of, you know, this type of stuff. But, you know, I didn't actually have something that I could search or discover things easily with or establish patterns or trends or make connections, which is what all researchers basically do. So I sat one night with a bunch of my friends and hackers and we actually spidered the entire NIC database, dumped it all into a, you know, sort of a back end. And we created a very simple web admin that actually allowed me to search through it and find very interesting patterns. I learned actually that about 40% of the island city of Mumbai is owned by ghosts and zombies. These people don't exist anymore. They are no longer the holders of the properties and game question, but their names are still on the books and the names are actually still on the website, including I found a friend of mine's grandfather who passed away in 1950. He still owns a building right on the street from my house. So these are the kinds of interesting kind of fun experiments that I got involved with. But I never had a map to connect this to. These are all tabular databases, they're big lists of stuff, right? In this case, having to deal with sort of property. Then I went actually searching for map data from the municipal corporation and through some contacts with some, in fact, Zainab was talking about the slum toilet study that we did. The same engineer who we worked with to do that study several years ago, privately one day kind of called me to his office and he said, bring your Ben drive. And he actually gave me all of this spatial data of Mumbai. Now it's beautiful stuff. I mean, you can't really see it if you could actually see it in higher resolution. It's absolutely stunning. I mean, it's every building in the city. Now the thing is that this is not really spatial data. It's just a bunch of drawings in AutoCAD. They don't have that long, you know, coordinates on them. They're not geo-referenced. You can't really do anything with this except look at it and, you know, it looks really nice and you can color it and style it in different ways. None of these buildings are actually closed polygons. I can't treat them as data objects. So thus, and this was only just about five, six years ago, I decided to teach myself GIS. How could I fix this and connect it to real tabular data so that I could actually map out these types of trends? And then I went to MIT and I sat in the GIS lab for two years and figured out this is the wrong way of doing things. But in the meantime, I figured out how to deal with spatial data. This is actually what that, what you previously just saw is actually generated from this, which is the development plan for Greater Mumbai, which actually you see all of these color coding and symbolities on the side are for reserving certain lands for certain purposes, whether it's for education, open space, civic amenities, hospitals, clinics, whatever. Each of them has got a code. Each of them has got a certain type of reservation that is guaranteed by law. And the municipal corporation actually uses this to vet every building proposal in the city in any type of land development. Now most people have not seen their own DP sheet. There are about 270 out of them that covered the entire Greater Mumbai area, but you wouldn't actually know where all the gardens and parks in your area are supposed to be unless you look at this. And most of the time they're not actually there. They've been encroached, they've been sold, they've been, I mean you must have heard about the other housing scams that I need to tell you about Mumbai. Now this was the thing that I went back to the municipal corporation to find out because they have even more data that I could possibly link to certain types of geographic attributes and approach this problem in a different way. I guess the resolution is not very good. This is a fantastic thing that I just got a few months ago after a bit of plotting with the chief accountant's office. It is a listing of the ownership and tenancy history of every plot in the island city of Mumbai in a gigantic Excel spreadsheet. It's really, really exciting. I know it doesn't sound like it. But I don't know how to map this. I don't even know where most of these places are. Unless I go back and look at survey sheets and I take this number and I actually kind of go through it and figure out where that number is. So it's a very, very painstaking manual process. These are the types of survey sheets that have the plot numbers on them. This, in fact, is my own neighborhood. My house is right down here. And I was thinking to myself then now, okay, I've got this one big database from the other side. I've got these old survey sheets with these numbers. Again, how do I map this? And this is still a problem that I myself am trying to solve in which I think a lot of other people who have got big databases that they want to map out or that do have these will have the same kind of itch that they need to scratch. One way of doing this is, oh my God, this is really terrible. So this is an open street map, the exact same survey sheet that you just saw. So this survey sheet is this an open street map. And if you can see here, I've actually gone and started hand-numbering every building. I've drawn all of these buildings myself as well because I'm studying this neighborhood and the history of land and property is very, very painstaking. But after you sit back of an evening and look at the work that you've done, it can be highly rewarding also if you're into maps. And that's the type of energy and itch scratching that the open street map project is built upon. People who have a real passion to explore, document, tag, annotate their own spaces. And if you connect that enthusiasm to the kind of public databases that are now becoming available, not really through RTI because RTI is kind of like a single shot pistol. You can just ask for one thing and you get it or you don't get it. But what we need is more like an automatic weapon sort of approach where we can actually get these big databases from the municipal corporation and parse them in certain sort of machine readable ways. And that's really the big problem. So I'll just, well, I'm going to go back to that. This is my final example dealing with the transportation database because the resolution on this is not very good again but this is the BST, the Bombay bus companies master Atlas 4 running every bus in Greater Mumbai. It runs into about a thousand rows. It's got very, very precise timings as you can see for headway and run time. But this is again very, very difficult data to deal with because this is maintained every month by the bus company and it's used as a way for them to keep tabs on their depo managers and make sure that the efficiency of services is balancing the load of passengers and things like that. So it's a really, really nice set of data but if you'll notice here there are a lot of missing values for headways at certain times of day that we've had to develop logic and code to compensate for that to make this machine readable and be able to pipe it out into things like Google Maps and OpenStreetMap and certain routing algorithms. It's a very, very difficult problem but it's not insurmountable particularly if the authorities start sharing data in these types of formats with you. And this is the most lovely thing that the BST has ever given to us which is a list of all of the bus stops in Bombay with their names on them as well as all of the routes that are there. So what we've actually done in this project that I'm running in collaboration with the BST and a group of people that's supported by IHS it's called Chalobest is that we've actually now gone and mapped out most of these stops. There's a very simple web interface around this public database and I think you can see it. And over about a week or so we just threw it out there, had groups of students come over to our lab in the evening and if they were from a certain area I mean I may know the bus stops in my own locality of Bombay but I don't know where every bus stop in the city is located. I don't know anyone who does but everyone knows their area. So what we did was that we just built a very simple quick and dirty web interface to browse this database and plop down points on the map and put them back into the database it's a very very difficult process especially making the maps in the absence of spatial data that comes to us from the authorities who actually have this data in spite of the fact that I know a lot of people in the municipal corporation I've been working there as a Ph.D. researcher for two years and I know that they have what are called the shape files which are actually the geo-referenced vector data for all the greater Mumbai. One of the engineers who was there and I can't give you this we won't even give it to MMRDA or the state government or the slum rehabilitation authority why would we give it to you? So that's the objective situation it's not that the data is not there and there are ways of getting at it and for the next few months I'll keep trying to get it somehow and I might be successful and when I am then that actually makes a lot of this work that I was just describing much much easier and I'm going to go back to the CRS to actually figure out how to do which is to map out property databases in the city this is the Mumbai City Collector this is where everyone comes to do a title search for their properties and this place is like walking into like a Dickens novel the way that all these petitioners are around and these ancient sort of registers that they have to pull off of these racks that are all written in longhand English that even I can't read my friends in the US keep telling me what are all the digital archives out there there's a hell of a lot more fun than a digital archive if you're into the thrill of the hunt so to go back now I'm going to kind of change gears a little bit, yeah go ahead the same files that you mentioned do they have do they have any kind of photo where they have saved it because I was trying to do it in the state of UP and at least they have a GIS called Srishti I was with the GIS officer in the NIC like no office but he, I asked him the same they gave him the same file he said we cannot but what we can do is collaborate with you and host the application and let the application use the files see I think every state is different and every body is also different you know I have been really impressed with how accountable and friendly the BMC is which everyone thinks is the most corrupt body in India, whatever, whatever all the complaints and little glass people but compared to state government agencies in Mumbai they are complete they shut the door in your face they don't even entertain your request so I think that it really varies but up until this point for spatial data except for parliamentary constituencies and district boundaries and district boundaries were available as shapefiles online a couple of years ago they've now been taken offline luckily I downloaded them then it's still not, we're still not there though there is you know open data spatial data has particular characteristics when you say give me a map we don't want a PDF or a JPEG we want a shape file so I'm going to get into some of that now so what the fish is GIS just for those of you who don't know and should bring all of us onto the same page there's a big difference between old school GIS geographic information systems we've got a lot of people for the past five to say seven years now calling NeoGeography which is a sort of web 2.0 crowd sourcing open street map whole trend but the real crucial kind of conceptual difference is actually about how we treat the map and I think Google Maps and Google Earth sort of stand somewhere in between GIS and NeoGeography and that Google Maps and Google Earth have really democratized the use of maps and the availability of particularly satellite imagery around for decades but there was never any way to serve it to people easily through some technology like the web you needed even broadband to run Google Earth which most of us didn't have five years ago and maps themselves being large printed documents which are incredibly detailed and require a high amount of persistence to be legible can be interpreted in millions of different ways even once you get them online they don't make them any more easy to discern or interpret right I mean maps are a unique form of media they're not like print they're not like video they're not like audio in that sense which are much more easily delivered over the web so open street map has really come into a point where and I'll kind of go into SM a bit we were actually looking at maps not as a read only medium but as a read write medium so I need to go fast why not Photoshop, Caterer, Illustrator most of you would know that because they're not actually geographic drawing tools they're just drawing tools every data needs to be georeferenced to be used spatially and why not Google Maps Google Earth I mean I have nothing wrong with them but the ownership of the data is an issue open street map like Wikipedia is community owned and you can actually make your own maps and tell your own stories through open street map so as everyone knows the sort of holy trinity for those of us from the free software movement everything is based on free and open source software I don't need to tell you guys about that open data formats and standards obviously are crucial but even more so in the realm of GIS because as I said if government agency gives you a map which is a PDF versus giving you a shape file they're empowering you in a totally different way with the raw data and we need to use these sorts of shared models standard services to kind of keep everything in sort of sync and open and public data of course is the foundation of this none of what I've been able to do would have been possible just by scraping websites once I start getting actually tabular data in a certain form from the authorities is when the fund begins like say with the BST so these are the types again this is a very technical thing just running through the different types of spatial and geographic data there's raster imagery which is basically pixel images there are vector features which are like line point drawings and there's tabular data right your lists your tables your spreadsheets whatever right and the the point of GIS or of mapping is to bring these things together right every object on the map is also an entry in the database and every object in the map is represented in only one of three ways it's either a point or it's a line which is a set of points or it's a polygon which is a closed line right so actually they're all derivatives of one thing which is a point which is a latitude longitude point and those of you who use GPS none of that would be familiar with how that works so everything is a combination of these three essential geographic building blocks as far as features are concerned features that is geographic features and these are the sort of file formats and standards that many of you who have done some work with geographic data or have heard about it would be familiar with the file formats of course shapefiles, GOTIF, GPX is what comes off the GPS device and these are all the open formats and web services that actually power the delivery of map data particularly over the web but also through database servers and this is how it all fits together another kind of nice technical diagram which is actually now quite dated this is from about four years ago and in fact the situation has dramatically simplified in the world of open source geospatial software so we can maybe talk more about that in a breakout session in the afternoon those of us who have worked with it so this is my last slide does anybody know who this is does anybody know where this is New York I'm really interested in New York not only because I've lived there but because like Mumbai it's an island city that had a very constrained ecology and which was subject to a high amount of capitalist violence and this guy was one of the guys who was the main architect of that between about 1940 and 1970 his name was Robert Moses he was the guy who built all the flyovers he also preserved a lot of really beautiful parks he laid the foundations for all of the bridges and everything that now connect Manhattan with Brooklyn with Queens with Long Island with its and there's a great book about him which won the Pulitzer Prize called The Power Broker about how he accumulated this kind of bureaucratic authority simply because all of the agencies that he was dealing with were so scattered and fragmented that they didn't understand the city's territory and geography and he had a really really good and passionate understanding of maps and he was also a champion swimmer he used to swim through the rivers and the Long Island Sound actually so he was on a high degree and he used it for generally what is agreed to be a very bad thing like basically promoting cars keeping poor people off of the beaches that sort of thing so we don't like him but he's the type of person who sits in every MMRDA BDA whatever type of authority the technograph so I mean the idea is that to take the power out of the hands of these people and put it in the hands of people like us to free the map empower your community whoever that may be since the community is a term that could apply to your clients it could apply to some others it's whoever you want but they will be empowered get out of the house and explore it's a good way to get up, go mapping it also helps to unlock a certain knowledge that we all have about our local environments but we often don't actually put down in writing or put down on the map where the locations of certain cultural institutions social institutions that sort of thing and geodata is a public good it's not a commodity it's still treated like that well it was almost over anyway so actually I'm just going to spend two minutes can I spend two minutes just showing some stuff on the web I'm starting with a big club for there's Jace for Hasge crew organized this excellent workshop with a group of friends of mine a few weeks ago if you guys want to know more about geodata and stuff like that and all the tools that I just very quickly ran through have a look at this website and talk to Kiran or Zaina or anyone else who's here this is about everything's really off-center now okay so maybe this is better done in a smaller room but I'm just going to run through very quickly some of the tools that are out there now for doing this Tylenol also called I mean it's supported by a company called Mapbox is a really really excellent tool for getting started with doing maps online just press the zoom button which zoom button which zoom button control I don't know even I don't know that okay anyway this is the that's much better right so this is Tylenol this is a little bit more of Tylenol you can actually go to this site and have a look at Mapbox.com these are the types of maps that you can do with Tylenol in a very simple way if you have like CSV data you know spreadsheets, lists that sort of thing and these are just some examples of thematic maps that can be made with Tylenol of different places PolyMaps is also a really nice JavaScript library for styling maps you can also have a look at this these are all different sorts of stylings that can be done they can be used even if you don't have a lot of spatial data you can still do a lot in terms of actually telling stories with maps using PolyMaps this is a tool that I have helped to develop for the New York Public Library called the map worker which actually allows you to take any scanned image and geo-reference it by pointing and clicking a bunch of points in common on a map and then what it does is that it gives you an overlay it allows you to overlay scanned imagery on top of actual maps so you can see that's open street map in the background and this is a city survey map from the 1960s of Mumbai which I actually found in the library in Chicago but this is really cool and this is very helpful I can show people how to use this later on in the afternoon Geojango is a web framework and Geojango is increasingly popular it's what I use to build the property card database and what we're doing with ChaloBest this is an example of what Geojango can do these are all the bus stops that I mentioned and we're actually using the BST database as a kind of a baseline this is all of the official data we're adding Marathi transliterations and we're actually adding locations to this as well which I showed you with the other interface yes and this is that stop editor interface which I also showed you and this is open street map I haven't gotten this this is the same map that I was showing you earlier but this is actually the editor interface and so probably in the afternoon those of you who are not familiar with editing OSM or tags and things like that we can go through that sort of in a smaller group and Geoj, time to take questions so okay until you settle this right then I'm just going to play this video so oh you're not going to have to disconnect this okay anyway I was going to show you the OSM change history for one year on a globe okay so are there any questions at this video I'll send the links around I will yeah I'll be posting all the links and the powerpoint that I just showed and send it around sorry is it public information can you put it online I could but why would I it's public information because I got it from government really do I have the rights to republish it I don't know that's a really good question of what is public if a BMC officer gives you a big spreadsheet on a pen drive is that public information but then the question is information derived from that data that is one thing that I've got to clear a question at least as far as transport data is concerned the BST has told us that we don't want you sharing our raw database whatever you want to put out on your API enhanced or not mentioned in your own ways you're free to do but I don't know about these property databases I mean I'm sure that every builder in the city has got it on his heart just already could she I guess talk about your research and the organology of mobile as your what are you trying to achieve on a larger scale are you trying to make data accessible or are you trying to publish fine things about geographical relationships spatial relationships I think that the whole process of research is always about making interesting and unconventional connections between things whether it's bits of data or different arguments about things and I think that I got into this not with the idea of making it all public which I think is an interesting byproduct of my research and I would like to make it public but essentially it was a tool for me to understand the city's history and as I said I'm still struggling towards that because the data is not there in a way that I could do that in New York City for instance where I've worked with the New York Public Library we have a historical GIS project that's been going on for three years where we've got building level histories I mean every building is a database also full of stories right about the past so is every building in Bombay but we don't have a framework for doing that yet so I think that a historical GIS would have helped me in my PhD research unfortunately I have to submit by this summer so I can keep doing this type of stuff but yeah I mean I would like to put it out as and when I'm happy with the results of the mapping