 Hi everyone. Good afternoon. Thank you for coming. Maybe you can call your friends some room Okay, so I represent Delivery which is an eight-year-old startup based out of Gurgaon. I'm not sure how many of us have heard about it Show fans anyone who's about Excellent, okay, so delivery was named the startup startup of the year in August A few weeks back. We'll be getting the award in two weeks. I think We're into logistics. We're India's largest second largest logistics company now our main business bread and butter comes from delivering e-commerce shipments from Wherever you order from to your doorstep. For example, if you order a shipment from Amazon About 25% of them are outed through us and go to your doorstep We are also into a bunch of other B2B services for example distribution Full full truckload warehousing and and the works. So we provide end-to-end logistic services for Customers not only in India, but now also in other countries in South Asia We have opened up an office in the US. We're expanding into markets in the Americas and Europe as well Okay, so that's delivery In terms of volume we do about half a million shipments a day We are about to celebrate our 500 millionth shipment in two weeks So that's gonna be a milestone The sale season is up. So it's a good time. We we are hitting around a million shipments a day these days So our systems are put under pressure. I've been on the call since morning That's a quick intro about delivery I am Kabeer Ustogi. I lead the data science team there I'll use this opportunity to tell you about what key problems we are working on I'll also focus on some technical details about the problem on the board But also we'll go through Why we are doing what we're doing build some intuition as to what are the main challenges that we face as a Logistics company and how machine learning helps us Okay, so I'll start with a brief introduction of our Network or operations so deliveries and operations company largely What we do is simple As soon as you place an order we pick it up from the warehouse It could be Amazon Flipkart Minthra whoever you order from We take it to our processing center where we sort it depending on where the shipment needs to go to From there it goes to a transportation hub, which it could be an airport or a major truck hub that we operate From there it either sits on a truck or an airplane goes to another hub Which is the destination hub from there it goes to some delivery center, which is equivalent of a local post office From there you have bikers who deliver to your doorstep so this is essentially what we do now My my PhD was in optimization algorithms, so I did this I finished my PhD back in 2013 And I joined delivery in 16 after a three-year stint as a professor Teaching optimization techniques and statistics when I joined delivery. This was like Disneyland for me, right? this bunch of optimization problems because we're dealing with a huge network our network consists of around 4,000 nodes Hundreds and thousands of edges At any given time you have about 10,000 or trucks running 30,000 bikers It's a massive network. So there's a lot of room for optimization here Starting with network design problems facility location problems truck scheduling problems Shipment allocation problem vehicle routing problem and the works The problem is that none of these really work if you don't understand your ecosystem that you're operating in quite well So all of these techniques if you don't understand your ecosystem is garbage in garbage out, right? So you can do all sorts of optimization. I did my PhD in machine scheduling actually I was very kicked about when I came to know these problems I even have a big book on machine scheduling which was part of my thesis springer offered to Publish it as a book. They were a kind although the readers were not kind enough the book sold only The book sold only a few hundred copies If it had a love story probably even more But the next one would be a love story Yes, so the royalties was slim. I thought it'll make me rich, but no Anyways, so at delivery I wanted to you know solve these problems at scale It was very good opportunity for me. However, we got stuck here So the ecosystem it's very complex as I said It starts with we need location intelligence if you are delivering 6700,000 shipments a day. We are touching 700,000 Addresses every day, which we need to physically find deliver and so on so you need to have a strong location intelligence You need to build customer intelligence right customers are finicky. I I'm not available on Sundays. I'm not available nine o'clock. I'm taking a shower. I don't have cash Come back into ours Which makes these things very tough to do We operate a fleet of 10,000 vehicles and 30,000 bikers a lot of intelligence Goes in there as well to coordinate everything We have 4,000 odd facilities and finally On the order of a million shipments a day So that is our entire ecosystem along with a lot of other Uncertain elements which we need to deal with Now given these things together once we have a very good understanding of these elements Then our optimization models can work because then you know what you're dealing with the hardest ones to To actually build intelligence on as it turned out was location In other countries when I was doing consulting work as a professor in the UK That wasn't a problem at all Because location address data in the UK or most western countries is very clean. It's very structured Post codes in the UK give you the lat long of the address within 200 meters 90 percent of the times In India, that's not the case addresses come in all shapes and forms these are Pictures that we took on on on a trip to old Delhi in Chandni Chowk I'm not sure who has who's from Delhi here, but Chandni Chowk is a very Congested neighborhood very unstructured. It's it's the oldest part of Delhi So the first address I have is 1880 slash 13 hyphen m hyphen 16 The one next doors call themselves as simply m 15 without the the other nice things The ones next door was 1858 w hyphen 2 Now with this sort of unstructuredness in the dresses It becomes very hard for us to understand what the customer really means Moreover addresses in smaller towns are even fancier right people people write addresses with Names of their neighbors next to Gupta G's house night next next to dr. Sharma's house next to the the the old one there All sorts of things you have workflows as well. I'll I'll skip to an example You have workflows like on the weekend delivered to my house on the week they deliver to my parents house Because that's where I spend my weekends There's a very nice threat which we found in in one of our shipments, right? This was I Have masked the necessary details here because the man seems dangerous but What he says is house number xx village xx Post office police station district Jalandhar rural Punjab I want all products which I had already been ordered Should be original. Otherwise, I take strict action against you because I am CID officer from Punjab When I last time order current watch the mirror and second harm And arm was broken Adam poor Punjab Pin code so the guy has been quite diligent writing everything right this there's pretty much everything we need Pin code city Brilliant address, but so many things that we don't need so Oh, yeah, yeah, I can't make this up. I mean Yeah, yeah, so Okay, so the the good clients I don't I don't want to name the good clients But you know the the good ones they have a proper structure to type in the address the smaller ones don't care right they they they want you to order you type in any crap and they'll Give it to us and we are expected to deliver so there are there others with with Nice words with greetings to your family and so on So Yeah, so dresses are painful Now pin codes in India don't work The median size of a pin code in India is 90 square kilometers As I said in the UK Average is around 200 meters So pin codes hardly give you any ability to localize an address Moreover 10% of the pin codes are wrong people don't know their pin codes Pin codes are something which are passed on from generations. You know your grandfather tells your father your father tells you There's there's no way to know what the pin code of your house is I'm not sure I've been working in this area for a while The postcode gives the post office gives some recommendation But then they open a new post office as cities expand and then there's none. There's nothing in the news You have to trust your father really So we notice that 10 to 15 percent of the pin codes are wrong Also addresses come with very nice spelling errors one of my favorite ones is and here you asked Which is one edit away from east and one edit from west and Combined that with the wrong pin code and there you go So That's why we have 30,000 bikers. I mean if this was cleaner, I'm sure we could do with less Also people right besides Very subjectively it could be next to one kilometer away five kilometers away wherever And of course we've gone through this So yeah, what is needed so my job here is to make sense of this So that once I know the address once I know the lat long then I can use all my nice optimization techniques to optimize my way there But we got stuck in the very very first chapter So there are few solutions out there which help us So in the news recently the government of India has been trying many different ways to standardize addresses There's plus codes from Google there's robocords from some consortium from MIT E-lock from map mind. Yeah, what three words they per all sorts problem with these things is that They are hashes of Geocode geocode is a lot latitude and longitude right so you must know your geocode and then it converts it into some hash which is an Alphanumeric code in what three words it's it's memorable three words and so on But at the end of the day you must know your geocode to identify a hash the problem is people don't know their geocodes It's very hard. A lot of people don't know how to use maps Pinpoint your geocode on the map is very hard. My wife has three degrees. I asked her to do this. She couldn't do it We ran an experiment We asked about half a million people to do this over Web link that we send them the average error on pinpointing the geocode was around one kilometer So which is worse than the noisy addresses that I have Okay, so we decided to build a bearer system Something which does not rely on geocode. So we decided to learn from whatever we have so We built a system from scratch which is able to disambiguate the addresses and predict the light a latitude longitude polygon, etc of of the address That's How it looks like that's the front end of our service so I typed in an address real address with misspelling What our system is able to do is it's able to predict, okay find out that it's Raheja Atlantis Notice the spelling is incorrect at the top. It's able to find out that it's in sector 31, which is a missing piece of information at the top Gurgaon, Haryana India, so it's able to predict the entire hierarchy of where this address is likely to be and predict polygon of where Raheja Atlantis is If you double-click on sector 31, it gives you the polygon for sector 31 and Tells you where exactly that house is expected to be which is the blue bubble there Okay, so this is what we've built I'll talk about how this helps us how we went on went about building this and so on But this exercise is very important because it helps us bring structure to the address So you type in any crap at the top it will disambiguate it and provide a structure So if we get very very good at this Then I really don't need the geocode from the people I can get their address and I'm able to Structurize it and give it back to them that look this is your actual address Perhaps you can use it the next time Okay, so speaking of polygons Polygons are nothing but boundaries of the localities That we are operating in Now why it's very important for us to build polygons is Okay for pin codes. It's it's it's easier right because pin codes are government-defined entities However, localities are not localities. Okay a lot of cities in India very old people have colloquial names writing localities, right? And next to my neighborhood there's a next to my locality There's a neighborhood called Char Dukan, which was found it was named because there were four shops back in the 60s And people started calling the entire neighborhood Char Dukan The government doesn't know that there's this it's Char Dukan, right and you won't find it any legal document But people write it way fondly So we need to figure out what people are writing and where those entities are so it gives a digital Existence to localities and pin codes often because pin codes are changing so even though the government prescribes them We don't know much about them Once you know what exactly your locality is where it is how big it is it you can do analytics on top of it, right? This this is a very good example. I'm not sure if the colors are a nicely visible here, but The black polygon here is that of a pin code in Gurgaon The smaller polygons that you see are that of localities, okay Now these are the polygons that we've been ourselves part of the steam was actually Vivek Who was with us? Until last year. He's moved on to Neustadt now, but thanks Vivek for I'll be quoting a bunch of your stuff here So so the green guys are the ones where we believe that our delivery time is somewhere between Zero to six minutes the yellow guys are the ones where our delivery time is six minutes to 12 minutes The red guys are the ones are where our delivery time is 12 minutes plus So have knowing these polygons is able we are able to do better analytics On how our field executives are moving on the ground What are the speeds they are in when they're in this polygon versus when they're in that polygon? And this helps us also determine the incentives that we want to give to our delivery boys and so on So this is about The polygons help us track our field executives as well, right because if I expect a delivery in sector 31 And he's marking something in sector 44. I can easily geofence the location and figure out What's what's wrong if something is a miss and so on And finally once I exactly know where the place is I can go back to my optimization techniques and Reduce the number of field executives. I need from 40,000 to 20,000 and I pay my salary. That's All right So to build all of this the sources of data so what we have is a Lot of address data as I mentioned we've delivered close to 450 million Shipments so far so we have 450 million strings of addresses which people have written Along with that we have a lot of location data that we've captured from mobile devices of our field executives So we have Around 450 million Addresses along with that around 250 million would have been tagged with an accurate geocode When the delivery boy goes to deliver along with that we have a lot of open source data from open street maps and so on So that's the data we are talking about In our estimate so far I we have delivered to around 85 million unique households the definition of unique is is fuzzy because The same person may write this the same address in 10 different ways Although we have an engine to unify them, but you know, it can't be 100% sure so 85 million addresses I just like the upper bound of what we have All right, so How do we do this so that's the data now I'll talk about what we really do with that data There are two broad steps to it. The first type is that I first want to understand How our localities and cities in India structured What are the states in India what are the cities in India? What are the localities in each city is including the ones like Chardu Khan So the first thing is we build a graph of all the localities This is totally unsupervised One way to do this is do this manually But that will take a lot of time, right? I mean India is a very big country with so many cities Delivery is live in 2000 plus cities. Each city on average has around 10,000 localities So you can do the math. It's it's it's hard to do build this manually So what we did was we started to build a build a graph Which was trained on all the addresses that we see 400 million of them And the graph looks like this is a snippet of the graph, right? So India at the top states I miss the city there But okay, the city is supposed to be good now The locality is there within the locality sub locality is within sub locality is towers and etc Now how this works is that given an address string. I tokenize each each word I build n grams from them and I put it in in a graph Bases how those n grams are related to each other in the address, right? so if I see sector 31 and Huda market in the same address I Am tempted to put them together and join them with an edge All right, if I see Rahid Atlantis and tower a written very frequently together. I join them in an edge So my graph essentially has three very important components One is we for each node. We have the probability that this node actually exists As I said because it's unsupervised and people write all sorts of crap in the address your your graph is expected to have all sorts of noisy nodes, so there's one probability which tells you what is the Probability of this node to actually exist the second one tells you The weight of this edge That the probability that Huda market is indeed a child of sector 31 And the third one tells you whether two nodes are actually similar or not because sec 31 and sector 31 maybe two nodes in my graph, but they're actually the same and We must know that they're the same so that we can merge them later, right? so Although this deserves an entire Presentation and a paper in itself on how we actually do this, but I'll I'll quickly just give away brief Intuition and move on So we learn these these three scores Bases a lot of features that we develop for for example the features to know whether an address whether a node is valid or not could be Simple things like the length of the that address the frequency the number of times people have written that address The geo codes which are associated with that the the size of the polygon associated with that So a bunch of features go into that we have a lot of label data That we have gotten labeled by some manual annotators We can train our models using that and derive these probabilities Using some machine learning model Okay, so the second part now the first part only tells me what entities exist in India, but they don't tell me where they are So now I'm talking about What this topic was it's supposed to be I want to now generate the polygons for each of those nodes in my graph, right? Which will tell me more about where they are How big they are how small they are what are they geographically close to each other and so on So this is a high-level workflow the several pieces that go in The first piece is obviously an input address comes in we search the graph for that input address That search looks something like this, right? So that's an input address it searches the graph gives you the list of nodes present in the in the address So this is a graph search that we do again This deserves an entire session to itself, but again, I'll skip knowing that it does its job. Well Once we know the node or the set of nodes present in the address For example in the in the previous address the set of nodes were a rajat lante sector 31 Good gown, etc So I know those set of nodes associated with the address the second thing the third thing I do is I go and deliver to that address The field executives goes and captures the lat long of that address when they use their app the next thing I must do is Validate whether the lat long is correct or not because oftentimes the lat long is not correct the field executives You know, they've had a long day. They they want to take a break smoke a cigarette So when they're smoking a cigarette, they you know punch all of them together They go back home punch it together at home and so on so there's some again There's some order which predicts assigns a score to each of the entries made by the delivery executive again It deserves an entire session to itself What I'll move on So the ones which we believe are clean so around half of them So we do as I said 500,000 deliveries a day We accept around 3000 data points from that 300,000 data points from that 40% we reject basis Some anomalies that we detect in that data So the ones which are clean So for a given node we aggregate all the clean points and then you generate The polygon for that. So that's the main intuition is any questions here and this part is important. So we can pause to No, it's not it's actually not a very big graph. So we don't really It's less than a million. Sorry No, no, no, we're we're not we tried using graphical databases where it's a standard psql and We're trying to use RDF and other things but so far a pure skill works for us, but it gives us bad performance So we're working to do better there So the time that it takes for us to search the graph is around 100 milliseconds right now We believe if you do a better job at choosing the right database 100 milliseconds for a single core So the polygons would What do you mean by fuzzy G code? Sorry No, the ones we rejected we have strong evidence to believe that they are nowhere close to Where they're supposed to be so we reject them all together If we have a sparse data if we end up rejecting everything then of course We run the model with a lower accuracy and you know We use the points which have lesser lesser accuracy as I mentioned we we provide a probability to each point So right now the threshold is at some some level But if we get very few points there we can reduce the threshold the quality of that polygon will be worse But it's better than having nothing So this spatial data is actually with all of this it's become a structured now right because this is a latitude Longitude some address string along with the node IDs that we've generated So it's need to actually we don't need to know Don't need to but I'd like to hear more about What what's Where your thought is coming from and we can discuss it offline. So not really so Sure. Okay. So point and polygon sort of things. Yes Correct. Yes for that. Yes. We we do have Certain libraries which do it very fast. I don't recall which libraries we use but someone I'll figure that answer for you and let you know I'll do that. So that's the next part of the that's that's the last box, which is what the talk talk is about Yeah, I'll I'll get to that by the way a node should have its own unique polygon But yes, there could be overlapping polygons if that's what you mean. So we'll get to that so once we go and physically deliver so we capture the geocode from the field executive and This is just a manual data point that we capture which is used in all subsequent steps All right so Now the final question on how do we create these polygons? So what we have is we have a certain node ID. Let's say that node ID sector 31 and for sector 31 we have all clean data points that we believe are taken legitimately and Now the job is to create some Boundary for sector 31, but it's challenging right because Even though we do do some Pre-processing on the data steps. It could still be quite noisy. There are multiple reasons where it could be noisy The field executive could have made a manual error as I said while smoking a cigarette and we may not have caught it The ones we catch we are definitely sure that they are invalid but the bunch of them we can't catch When you go and deliver do delivery in a building. There's bad GPS signal although, you know, the Jeep the guy has done a good job, but because it's Turning to the closer cell tower and so on you could get away bad geocode The other thing that would screw up is that my node identification could be wrong So when I was calling adfix, which is the graph search tool that we have it's possible that it screws up It does screw up in three to four percent cases That it could identify the wrong node that happens many times when people do write very bad addresses, right? They write their locality and they also write the locality next to theirs Just they're trying to be helpful But but it doesn't really help so so they try and put as much information Which confuses our systems in about two to three percent cases as a result you see something like this so the red ones are the ones where We believe that the node ID is let's say sector 31 the green ones are the ones where the node ID is something else sector 40 Let's say So you do see some you know criss-crossing here. So you see some red ones here. You see some Green one somewhere else. Okay. This is a bad example But the intuition here is that you have a mix of dots and crosses and you want to draw Decision boundary to separate sector 30 and sector 40 a node is a Locality or a city This stuff right so it each of these are nodes so it could also be the rooftop For example, if someone has delivered to plot 5 in hood a market many times We would have a note for that as well But normally door numbers are quite noisy so people write door numbers in all shapes and forms Okay, so the intuition here is that we need to create some decision boundary So that we are able to physically separate sector 30 and sector 40 I like this example a lot because in this case decision boundaries are actual boundaries On a map so it's it's very you know cool. Yes So the red polygon is after the decision boundary was created So what we first have is just the red dots and the green dots and Then we created the then we ran some ML algorithm which some classifier which you know divided the space into two I'll we'll come to this in a bit. It'll be covered. If not, we'll take it again Okay, now let's formulate it as a machine learning problem So what we have is our feature vector is the latitude and longitude and the label is the Locality or the node ID which we predicted from our graph search So to get the label we have done a lot of work basically To get this was easy, but to get the label we had to build that entire graph search it and so on once we have this in this form We should be able to apply any non-linear classifier To build a nice decision mark to separate those two planes Here The output that we're expecting could be two types to answer a question It could be overlapping locality polygons or non overlapping This becomes the overlapping bit are harder because again in this case, it's not a clear separation between two You could have labels which are you could have the same feature vector with Multiple labels right which which obviously makes things harder For the sake of intuition I'll stick to the non overlapping polygons to get a sense of non overlapping polygons think of pincodes all pincodes are expected to be mutually exclusive Okay, so what we do now is Ashore in the earlier picture we have certain red points certain purple points We try and create some decision boundary in our case we tried using a KNN Decision trees neural networks something as simple as KNN works fine with case around somewhere close to 15 What we do is we divide the entire city into a small grids The centroid of the grid we find the closest K neighbors The closest K neighbors are those points the blue points and the red points the nodes associated with the blue points are the majority of the nodes associated with the the neighbors become the response the predicted label for my Grid now once I I know the predicted label for each grid all I need to do is merge all of them together and get a clean polygon So this one is the polygon for the pin code 6 double zero double zero six, which is somewhere in Bombay, I think Sorry Okay Now we do some post-processing we run concave hull library on the existing point so that we clean up all the empty points And we get a tighter polygon just to give you a sense of how Accurate these polygons are I have drawn drawn the red one here is from Google Maps and The green one here is from authentic government sources, which we believe is the source of truth At least for pin polygons the government sources can be relied upon In terms of precision recall Recall can be seen here as if you assume the green one to be the source of truth Recall can be seen as the fraction of the green polygon covered by the blue polygon, right? and precision is the fraction of the red dots Divided by sorry the the fraction of the blue dots divided by the red dots And the total dots within that so the red dots compromise your precision basically so less of the red dots in your polygon More will be the precision Okay, so these are improvements and that's the final polygon we get I can do better at fine-tuning That's when the road networks come in so locality polygons are you know naturally would expect that a road divides a locality locality a from locality B So instead of going for arbitrary grids like we did earlier We can choose grids divided by road segments. So we take we took OpenStreetMaps entire road database Vivek did a lot of work here And we got all possible grids that the road intersections created and Again, you use a simple KNN sort of techniques Take the centroid of this grid check the K nearest neighbors and assign the label accordingly This gives us a cleaner boundary as as seen here So you see that it does follow the roads perfectly. You still see some jaggedness here. That is because of the noise And we're trying to work on other techniques maybe to reduce that noise So this is how to get better final slide here is a few observations and a few weaknesses of this The first is that we need to look for a classifier which is Fairly nonlinear right the model nonlinear the better the curves of our polygon will be you don't want very linear algorithms or Even there's something like a decision tree doesn't work very well because it does have a linear separation between For overlapping Localities this is quite painful because in overlapping localities. What you want to do is For a given dot there could be two possible labels right a given dot could also be in sector 31 and Also be in a sub locality within sector 31 to take this example I may have a polygon for this hotel or what's this called brigade area something like that, right and It's this location also lies in Rajah Jeenagar So the same X comma Y has two labels. So when I'm training The classifier the predicted Label I'll take not just the top one, but the top two in this case Yes, so now often because the early technique works only if all polygons are mutually exclusive But that's not real life right you have hierarchical structures So in that case every point may have multiple labels, which means you may want to output multiple labels Now the problem with multiple labels is that if you take the top two labels coming out Also a lot of noise may creep in So again that needs to have a lot of fine-tuning when it comes to these models and we are working on this So this one is an example of locality polygons. Hence you see some noise there Localities by nature are not mutually exclusive. There's a lot of overlapping Or a lot of parent-child Relationships in codes by nature are mutually exclusive. So we didn't face that much problem with pin codes But for localities we do Finally the final caveat here is that all of these polygons are what people think their localities are for example when a person writes I'm in Sector 31 we believe that he's in sector 31, right? I mean It's possible that these localities the polygons that we have may be smaller than actual government defined Polygons because the they belong to only our data set what we see is only those areas where people are ordering from Where we're delivering to and so on. So even if the government says the polygon for 6 0 0 0 6 is 30 square kilometers We may only have about 15 square kilometers of it because that's where people live All right, that's that's it for me. You can have more information on our egg blog and for Thank you any questions So we do build polygons for pretty much every city we deliver to However for the purposes of a presentation the cleanest ones we had for For bigger cities because we have cleaner data there for our internal purposes. We don't need to put it on a slide So we we they're jagged. They're bad-looking, but they work Your second question was about blacklisted areas. What does that mean? Yes, that goes into the Analytics part that I spoke earlier about right there could be areas where In general customers are harder to deliver to or there are too many returns or there's simply too much violence or abuse Whatever so I can classify that the harder part here is how do I identify which areas what once I do that? Then I can do all sorts of analytics and Assign attributes to my localities even addresses Yeah, yeah, yes So that depends on the use case. So if I'm looking for something, okay I want to see my returns over the last three months in this area. I fetch that data and and and build it Thank you