 Good afternoon everybody, so today I am going to talk briefly about this concept called Geohasher. So can I have a brief of hands like how many of you already know about it or at least I've heard about this stuff. So if I can have a Twitter pitch source for Geohasher, it would be just this line. You encode the latitude-longitude system into a string, that's what Geohasher is all about. So what's the plan today? We'll just briefly look at what Geohash is. Why do we need such a system? The common implementation of Geohash using the geohash.oit-webservice. How does it work? And also how do we do the nearby and looming properties? Also the nearby searches. And also I'll show a small demo that I bashed up yesterday for that. And who am I? My name is Sandeep. I co-founded this company called Ideophone with Sundar and Anand, both of them are here. We have built a couple of mobile applications for people who commute and try it. So going into what is Geohash? So Geohash is a simple way of encoding a latitude-longitude into a string. So it's a compact string encoding of geographic coordinates. For example, if I could just ask what is the lat-long of Bangalore. I'm sure most of us don't really know what the value is. Instead if I can just say Bangalore, the Geohash is TDR1. That's it. Bangalore is like TDR1WX. So it's like a hierarchical system that follows the entire latitude-longitude is divided into rectangular grids. And so as long as the string length keeps increasing, the precision keeps going deeper. So it's available in public domain and invented by Ustaho Nima. So why do we need it when we have it at long? The basic problem that most of the speakers were telling in the morning that we usually have big addresses. Like for this Katanama event, we had such a big address, it's not standard. And the latitude-longitude is very difficult to make sense of unless you have a map. Instead, the mail could have just said that the venue is this particular string, which is like 8 or 9. So as I keep increasing the string length, the accuracy increases and you can pinpoint to that point. And you can also access it via the web service, which I'll come to later. So it subdivides space into buckets of grid shape, rectangular shape. So it doesn't really mean that you can reach at a point level. It just means that a particular given point, when we say that Bangalore is TDR1, it just means that that particular latitude is within that boundary. That's all right. It's not like a point. And it's a hierarchical structure, of course. And that's already told. Longer the geo hash is the smaller the high precision. So I'll just show a small demo here, which I just forked it and I'm using. So I just searched for the energy and research institute and I say I wanted a level of 5 and I plot it. So it will tell me that the particular, this is the center area. So the place we searched was here. And this is the center. It will also tell the neighboring geo hashes. So if I increase the level and like try to go in and pick one deeper, I am almost the energy and research institute is in the center box. And these are the neighboring, as you can see, like most of the neighboring geo hashes have the common prefixes to the code. So how do we get into this like Bangalore is TDR1 or don't know is TDR1 WX. So the process, how it works is you kind of keep dividing the math into smaller grids like horizontally and vertically alternatively. And you keep going and zooming in. So this is the particular place that I mark as a red point. You keep going and this is like a very big code that I get in binary. And then just with 32 bit encoding, you get a particular code. So geo hash.org is one of the reasons why this geo hash system was made. It was done by the person who started this project. And through this geo hash.org, when you say that geo hash.org slash TDR1, it will directly take you to a map with center as the particular geo hash code. And it will also give you links to like OSM and Google Maps. So this is an easy means of like if you want to have this location in like emails or Twitter. So it's like you are shortening for the entire latitude long note system. It becomes very easy to reference it in websites. And so for example, you can just say geo hash.org TDR1. The meeting is there. And so because of the way that the geo hash system is made, the grids with hierarchical system, one major advantage of that is being able to easily group stuff. So for example, this particular address for main road dongle stage, it has this particular geo hash. So imagine I just remove the last three, this geo hash value. I go into a higher level precision, which tells me like this is a dongle level area. And I go even, I remove a little bit more like four strings. I reach at a band load level and TDR1 is like the entire area, which is Maloo, Bangalore, all of that together. So this way it becomes very easy to group stuff. And because of this, the way that's hierarchically posted, nearby searches or proximity search become very easy because they usually have similar prefixes and long common prefixes tend to be like that two places are near. But there are exceptional cases, the edge case, say like two places which are very near in Boston. But at the third level division itself, they fall into two different buckets. So the code for that place starts with DRT for this DRM. So at that point itself, it doesn't come into the neighboring if you're doing a very simple string based search. But there are libraries that allow this geo hashing and give the ability to search for neighbors and the bounding box do handle these edge cases. So again, just to understand the coding part, so Terry is this particular big code. What I want everybody to notice here is like TDR1W is common for Terry and Bangalore international chapter. But CIS, which is pretty close, has TDR1W. It's just because it's in another bucket. And our office, which is actually on Indranagar 80 feet road, we and CIS share the initial fight. So just have to end. So if I didn't even know where my district is, I can still figure out like TDR1W, this Bangalore. Chennai, the only, it's just the starting point, which is similar. I mean common to Bangalore and something which is on the opposite side of the globe. It's a totally different code. So most of the languages have libraries for accessing geo hash, like doing encoding, decoding to find bounding boxes to find nearby neighbors. So, for example, bounding box, if I give a particular geo hash and ask for the bounding box, it will give me the lat long of all the corner points. And if I do just the neighbors, it will tell me all the nearby geo hashes. And if I do expand, it will tell me all the neighbors, including the cell that I'm talking about. And to find nearby places, this entire system, these two methods, one is proximity search, which is a bottoms up approach. And the other one is the bounding box. So like if I have to search for, if I imagine I have like a lot of points of interest around this area, and I won't, I'm starting from that point and I want to find like how much grids away that is. So the proximity search is like, I start with that point and I keep increasing the zoom lag or decreasing the proximity. So whenever there is some point which comes in the same geo hash, I can like, okay, this is this level apart. So that's the bottoms up approach. The next one is you keep expanding the bounding boxes. So you go like one level, then you go for each of those, you go the next level and do it. So it's just a choice, like which approach you want to follow. In one of our products that we are building at in our company, we have a lot of points of interest. So like imagine I have the lat long of all these places and I want to like quickly find out which is the other way. I mean, nearby points of interest. We used to have an approach which we, I mean, kind of find the distances with lat long, but it's not very efficient. Instead, we just computed the geo hash values for all these given lat longs. And while doing like a SQL query, we just do like a light thing because it also depends on the application context. So this is looking at nearby proximity, what is near. It's not exactly for the distance computation source. So from this data set, if from the data itself, I can identify like I search for places near points of interest near San Francisco. And like this particular nine nine to eight wide for all these places, at least nine to eight is the common part. So then they are actually all nearby places. So you can either do like a prefix match in a simple line query or you can do like the geo hash expand on the particular value and keep going until you find the place. The major limitations are of course, locality anomalies, which is like the bounding in the edge cases for that you have to handle them separately. And so bounding box process itself will be a little bit more computationally intensive. And the projection based model. I mean, initially, Raju was talking in the morning of these grids itself, like the difference in the grid sizes in the polar region and equatorial region. So that kind of will be a problem in this. And the size itself will be different. So this is like a mashup I did yesterday. So these are like points of interest which is relevant to our application and imagine that we are trying to find something near Manhattan. So this is at the highest level of precision that I am. So I increase the missing and do FH. And it kind of zooms into the level where and like from the subset narrows down to the points which are at that particular level. Then I do further and keep doing to it. So you have like it. So you can build that into your algorithm where you say like till what level you want to go and like. So here I say like okay, but I just wanted to find like five points of interest which are near me. So just give that and so that's it basically. So it's a simple and effective system to map complex lat long into very easily computable string. And it's useful in applications where like nearness factors more important than the actual distance or navigation. So it depends on the application context basically. And I mean it enables quiz to do the access with like like and stuff which is very less for performance items. I think the last bit is the important part is that you put a little bit at my scale. Yeah, otherwise yeah. Okay, thank you. Then I have used this on distance. So any other encoding would also give the prefix map. It takes a long life. This thing works for both lat and long in one string. Let's take an example. We can use 72 points something to reduce it to like 4 decimal. See how do you find 71 points something? 72.0. 71.9 is very close, but you can't do it with prefix maps anymore. Otherwise it's just a string of paths. No, but that problem is still existing in here. It's much better than this. Performance lays in the applications. So you take that 72 point example. That's a decimal. You convert it into 32 pairs or 36 pairs or whatever. Then it reduces this problem again. That's exactly what this thing is. So any encoding would likely give this. Why this? And how is this? The thing is, it's a well-executed format. And the useful thing with this is because lat and long are equally. Every alternate between lat and long. So which means then that if you want to move from 72,8 to 72.1, 8.1. It's very similar distance in terms of the actual hash. In fact, it effectively utilizes something called carry trees at the back end. So the two-color kind of thing. So because you have two axes, Neanderthes has to take both in dot com. That is where it was used by Ray. So one, along the vertical axis, you compute left or right, that is 0 or 1. And along the horizontal axis, you do up or down. And then MySQL has this geospatial engine. MySQL does not have that. That's what the MySQL has. They just have one. That's the last thing I was saying. Who are using OpenStreetLan? OpenStreetLan. This is a copy shortcut. It uses this. So it does not make it custom-made. Use Google for instance. It makes a shortcut and saves it to the database. In this case, it does not make it custom-made. It just puts the hash there. So it will be OSN.org slash hash. Yeah, I am asking which one comes with it. That I am not sure. The bounding boxes that you showed, they are fixed in some sense. No, that depends on the length of the string itself. So the more the string length, the smaller is the area. So you are like zooming into that particular layer. So when you just say TDR1, it's at a very high level like Bangalore, Bangarpet, everything comes together in there. I have a problem that I want to solve. If I have the lat-long of an MLA constituency in Bangalore, with this, can I find the neighbors, like the exact neighboring constituencies who share an edge with that constituency? Do I be able to find something like that? You can do an approximate neighborhood kind of thing, but since the boundaries are not rectangular, you will not have a precise thing, because if you have only one point for a constituency. But if the zoom level is enough for me to... But then that also is not constant, right? Bangalore may be a bigger district than some other place. Basically you can filter the data set and very effectively work with the smaller set. That's one thing, but the limitation will still be that it is rectangular and grid. So if your constituency is not close to a rectangular thing, unless you have a lot of grid points, instead of having just one point per constituency, if you had all the smaller grid cells that define that constituency, in that case you could do that. You can group those smaller ones. You can group those small things. If somebody sees a problem... But more than rectangular, it's not a problem of rectangular grid. It's a problem of the constituency boundary. It's an existing in the data. That is the... Hey, you cannot find the neighbors like that. Just given one point. Exactly. So that data is not here. Yeah, but then you have neighbors there, right? So you have one rectangle where your country is situated. Here, another X, Y could be there in the neighboring tile. So if you just give 30 kilometers as one radius, you can just get all of them, which may not be belonging to the current rectangle, but instead belong to the other rectangles. So then you get all those constituency or whatever your POI, which is in 30 kilometer radius. That's what I'm wondering if I can do. That's what I'm wondering if I can do. That shape is not in the data. If you just have X1, Y1, X2, Y2, it is simple calculation. So that's what performance with the geo-hashing, I'm not sure about that. Otherwise, when you go for it's a simple distance calculation. Two coordinates, two points you have, X1, Y1, X2, Y2, simple distance calculation. It works with India because you're very close to the equator. Once you get really high up, not the south, it stops working. No, all the navigations normally, but the nearby surrounding this one works only with this one. Geo-hashing is not used as far as I know. But if you have a... Yeah, it's not used as far as I know. Performance when you have such a long stream? Yeah, so... One question. I think you're right. Just take the rest of the readings of the... I can try it, of course. Just... Yeah, somewhat related to what she's asking. So this can be useful in proximity searches. For example, I want to search for some events within 10 miles of this particular zip code. So is there a decimal place to distance balance? Yeah, you can do that. And the library supports that. What precision you want and why you encode that at long itself, you can keep that as an option. For example, up to 5 decimal places is like 100 kilometers. Like some 7 decimal places. Yeah, you can set the precision level. More questions.