 Myself Sunil Rathi, I am working with Swiggy as a senior data scientist. Today we are going to discuss predicting order batchability. So this is the flow of the presentation. We will start with the important terms. These are important to be aligned with the presentation. Then we will jump to the introduction, batching, traditional order flow, what should be the next version of order flow, machine learning batching prediction algorithms, first order and second order, then our machine learning model and then the experiment, batching. So batching simply means to deliver two or more than two orders in a single trip by a single delivery executive. I think you all must have experienced, Uber, Pura, all us here is a similar concept. First order, say you have n orders in your batch. If you sort all the orders by the order time, then the order which have the minimum order time will be the first order and all the rest of the orders we will call as second order. Predictive service level agreement or PSLA. So this is the estimated time which conveyed to the customer that how much time this delivery will take. Service level agreement or SLA is the actual time taken by the delivery to deliver the food. Compliant orders. Compliant orders means that where we meet our promises, basically where the actual SLA is less than or equals to the predicted SLA. Net promoter score or NPS is a business metric that we generally use to track how the business is going on. This is from the customer feedback. So before we start the presentation, let's first discuss why do we even need it. Any, every company is doing it, patching any, all a share is doing, Uber is doing and rest of the companies are also doing it. But why do we need this prediction? I will tell you a short story so that you will be better aligned with the presentation that why are we doing it. This big why. So I have a friend, Jagmeet, who generally orders food from Swiggy. As usual one day, he was hungry. He went to the Swiggy platform. He went to his favorite restaurant. He picked up the items, added it to the cart. At cart we showed the predicted SLA that we discussed. He was happy with it. We showed it 33 minutes. He placed the orders by looking at that. This is 33 minutes. Okay, I'm fine with it. Everything was fine. But the order was in peak hours and that too in dinner peak. At this time, we have a lot of orders. So to handle load at peak hours, we generally batch the orders. Such that they made the necessary conditions and customer experience doesn't hamper. So Jagmeet's order got batched with order O2, say from customer C2. Now the delivery executive has to wait extra to pick up the second order. He went ahead to deliver the C2 food first rather than the Jagmeet's food. And then he moved ahead and then he delivered to the Jagmeet. So overall in all these operations, it took something around 46 minutes. We promised 33 minutes. We were unable to meet our promise. We felt bad. We apologized to the customer. We took all the actions which we generally take when we don't meet the promises. Next day, we were sitting together as a data science team. We were brainstorming that why are we not able to meet the promise? Then we looked at the data for the batched orders and non-bashed orders and we find that this is happening more frequently with batched orders if you compare with the non-bashed orders. We asked the questions within the team, why are we not able to meet the promise? So there was a simple answer because at the time of prediction we don't know that this order is going to batch in future. Fair answer. Right answer. But is it a solution? No. Let's go through that how we build the solution. So let's compare a normal load scenario. So here we have NFDs in comparison to the number of orders. It's a normal load. So delivery executive can pick up a single order and he can deliver it to the customer. But what will happen at high load? At high load, there will be delivery executives who are busy in delivering the food. So these two guys are busy. Now just consider a simple scenario that there are two orders. So let's first compare with the... So we know the pigeonhole principle, right? So it says that if you have M buckets and N items and number of items are greater than number of buckets then some buckets will have more than one item. Same here. We have less number of delivery executives in comparison to the number of orders. So some delivery executives has to deliver more than one order in a single trip. And that is batching. So this delivery guy has to pick up the orders from R1 and R2 restaurant and then he has to deliver it to the customer C1 and C2. So we discussed what is batching. But why are we doing it? There should be some perks. So we discussed that it handles load at the peak hours. It gives a better utilization and payout to the delivery executive. So if the order are best, the simplest scenario there are two orders in a batch and in a general case if you are paying something like 50 rupees then for the same trip he is getting extra money with the cost of extra time, little extra time that he has to move from customer C1 to customer C2 and if it's a multi-restaurant then R1 to R2 also. It's a small time in comparison to total trip. Now from the business side it reduces cost for delivery. So if the orders are best, so say for first order you are paying 50 rupees and you said that if the order gets batched with this order you will pay something like 30 rupees or 40 rupees. Say 40 rupees. Now per delivery if the order is best then you are saving 10 rupees. So it's a win-win situation for both business as well as delivery executives. It increases speed for second order. So at high load what is the problem that we have less number of delivery executives in comparison to the number of orders. So if we don't do batching then the order will wait at the restaurant until the delivery executives deliver the food to the customer and then come back. It took a lot of time. If you just batch it then he will just pick it up with the cost of the preparation time of the second order that can be small. So overall it increases the speed of the second order. So let's look at the traditional order flow. So when you choose some items and add it to the cart you are at cart where we predict the SLA. We predict the delivery time. So now if the customer is happy he will place the order. Now when this order comes to our environment we'll check that whether this order can be best with some existing order or not. If it is then we will assign a same delivery executive as we will assign a new delivery executive and then it follows the rest of the steps. So if you focus closely this prediction of the SLA is before batching same Jagmeet story we discussed that the answer was simple that we don't know at the time of prediction that this order is going to batch. So now look at the numbers that will tell you that where are we going wrong. After deploying the batching thing after few weeks we compared the numbers between batch orders and non-batch orders. So the compliance of batch orders was 20% less than the non-batch orders. Batching was a major contributor to the bad orders. So bad orders are the orders whose delivery time is more than an hour. There were higher chances of customer to connect to the customer care because you are not able to meet your promise just after the estimated time they will connect to the customer care to ask where is my order. So we looked at the numbers how it can impact the business. First thing is repeat. If you are not able to meet your promise then the customer will not be happy and he or she will not order food as frequently as a happy customer. So the repeat rate can change. It can decrease and pierce. So when we compared the numbers between batch and non-batch orders there was a 16 point difference between these. So with all the perks of batching still the business was going in the negative direction because of the customer experience because we are not able to meet the promises. Customer care. If you keep on breaking your promises there can come a point at which the customer can look for the other platform. You can lose your loyal customers. So in a polite way I will just say that we should respect our promises. So we looked at the previous version of the order flow and we found that there are some issues even with the perks of batching. So how should we do it? Now let's look it. So when we add the cart we will check whether this current order can be batched with existing order or not. If it is then it becomes a second order. So second order batching prediction will handle it. This prediction thing. If it is then we know the environment with which order it is going to batch and then we can predict the right asset. If it doesn't batch with the existing order then it becomes a first order and here comes the role of first order batching prediction or core machine learning algorithm. It will predict that whether the future batching is possible for this order or not. Because there is no other order in the current system right now that this can batch. Is future batching possible or not? If it is then we will recalculate the SLA. Else we will throw the same SLA as predicted by our time prediction algorithm. So in the context of batching predictions we handle differently the second order in the first order. Because they have different behaviors. Because second order we already know the cause with which order it will be batched with first order we don't know. That is the second order batching prediction because that is much more trivial than the first order batching prediction. Second order batching prediction. So when the order is at the cart you have all the information about your environment all the orders all the batches. Now with this current order you can check whether this order can be best with existing order or not by turning all your conditions. If it can be best then you know that with which order this will be best. At this point you know all the information that what will be the customer to customer distance what will be the restaurant to restaurant distance what will be the wait time so all these properties. So you can use these information you will pass this information to your time prediction algorithms and you can calculate the right SLA. If it can't be best with the existing order then you will throw it to the first order batching prediction. First order batching prediction. So now the order didn't batch with the existing order it can batch with some future order. So how will you identify that there can be a possible future order that will meet all the necessary conditions and it will batch with this order. So first order batching prediction mainly depends upon demand prediction batching conditions order level predictions and historical features we will go through one by one. Before that let's discuss that what is geo hash because some of these some features are dependent on this. So we will quickly go through geo hash. So geo hash is a public domain geo coding system. What it does is that taking a space it will divide into grids and keep on dividing into it so that you will get higher precision. So these are some of the precisions to which you can work on. So we are basically working on the precision 6 that is 6, 10 by 6, 10 meter for the first version of our first order batching prediction algorithm. So geo hash is just a string the higher the length of the string the precision or smaller the size demand prediction. We have an order at card. We want to identify that can there be a second order that can be batched with this order. So we are now depending upon future we have to predict the future. So from the demand perspective demand prediction perspective we need to find restaurant future order. So this order comes at our restaurant R1 can there be a future orders for this R1 so our demand prediction algorithm will tell you that say that this order comes at time T between this time T to T plus delta T can there be future orders? If there are good number of future orders then there are higher chances of batching. So this delta T you have to set according to customer experience you have to look at the data. This can be 10 minutes, 15 minutes it depends on the predicted SLA of the order also. If it is only 10 minutes you can say that for this order we can deliver it in 25 minutes 15 minutes something like that. Same for customer geo hash. So if you can identify that there will be good number of future orders you can find the order density for that particular geo hash. Now we already discussed that we are working on the geo hash of 6 10 by 6 10 meters. So if the order comes from that geo hash then it can meet the one of the necessary condition that customer to customer distance. We will discuss about it what is customer to customer distance. Then customer geo hash to restaurant future orders. So from particular geo hash to particular restaurant what will be the future orders? So which is the good number? Now customer to customer distance will be small restaurant to restaurant distance will also be small so there will be high chances of batching batching conditions. So these are basically put up by the business or operations such that we have a better customer experience. So it can vary from geography to geography. So what can work in say Bangalore will not work in Ambala and vice-versa. So major contributions are from customer to customer distance. So customer to customer distance is that how far to two customers will be such that their orders can be best. So if you say that customer to customer distance is one kilometer that means two customers will be can be at my at most one kilometer max order delivery time. So this is the maximum time that should be taken by a batch by all the orders in that batch. So if you set say something like 55 minutes then all the orders should be best in this 55 minutes. Sorry should be delivered in 55 minutes. This makes all delivery time customer to customer. You have to look at the data that what is the sweet spot? What uses you should set it should be different for the high load. It should be different for the normal load. So it can normally you can choose something like where the customer experience is right. So you can choose something like 45 minutes then there are item level conditions. This is basically like for some particular some items can't be best if you take an example of ice cream if you bet that then it can be meltdown for these particular items you have to treat differently order level predictions. So now we know that we have a current order at the cart. For this order we can find the necessary properties like what will the prep time preparation time what will be the predict SLA. So by using all this information we can identify some of the batching properties because this is part of the batch. Major contribution is something from order preparation time. So same example if it's ice cream you just give you a scoop and then you can have it. So its preparation time is less than a minute. So within this time window within this minute there are less chances of batching because the future orders will not be much. Order service level agreement that is predicted SLA. So if you set the maximum delivery time say at 55 minutes if this order is PSLA something around say 52 minutes then there are less chances because if you batch another order there are chances that you will not meet the necessary condition. Part of it is item level constraints. So these are again basically specifically for the items like it can be best only with pizza and something like that. Ice cream can't be best. Ice cream can only be best with the ice cream or it should be become the second order then only it can be best. So these are some of the conditions. Historical features. History tells you about future. So if you can look at the history you can identify that what was going on so that you can predict something about the future. So restaurant batching ratio. So if you know that a particular restaurant is popular in this order bucket for this particular time interval by looking at the some few weeks back or some days back or months back then you can some guess that this restaurant can have a high batching ratio at this time. If you just compare that for biryani people generally orders at the dinner peaks. So restaurants who are serving biryanis have high restaurant batching ratio for that dinner peaks. Same for customer g-hash for customer g-hash batching ratio you can take an example. So say you had a customer lead long from that you can find the g-hash. Now if this g-hash is falling into office complex area then there are higher chances that this will be best in lunch peaks rather than the dinner peaks because they will move to the residential areas to their home and customer past experience. We checked that what was the experience of the customer and the previous five or ten deliveries. If we were not able to meet the promises then the our batching prediction algorithm will handle it by saying that you shouldn't bear this. Assign a new delivery executor. Let's give a customer a really good experience. So this slide is specifically for data scientists. So we experimented with different models random fog boostings. We come up with the deep learning model. So the highlights of the model are basically like we had the data points of around 11 million features used was something on 58 which we already discussed and some extra features from our platform. Number layers used was 8. We used value as activation function for hidden layers. Adam as optimized and the loss function was binary cross entropy. We checked the results and the results was good. So overall for the first version it was precision 90% recall 84% and F1 score 86%. So we were happy with the results. We deployed this model to some of the zones in Hyderabad and for this version we didn't do any change from the time prediction algorithm side. What we do is that say your time prediction algorithm for this current order is predicting say 30 minutes. Now your batching prediction algorithm is also saying that this order can be batched in future. So we'll just increase this by 10 minutes. So in spite of saying 30 minutes we will say to the customer that it will take 40 minutes. We integrated this model using a data science platform with our internal data science platform. So the initial question from the business side was now you are increasing 10 minutes at the cart. So there are chances that there will be conversion drop. We told them that we are not increasing 10 minutes for every order. These are the highly probable orders that can be batched in future that we have to increase 10 minutes because we will not be able to meet promise. This will be a small percentage. They gave us a go ahead. We deployed in Hyderabad some zones. We found that there was no drop in conversion because people can understand that this is a dinner peak. This is a lunch peak. It generally takes something this much time. Even when we were saying that it will take less time, they didn't know that it will take a lot of time. So when we are promising that there was no conversion drop. Compliance for batched orders improved by 15% because of better promises now. In spite of PSLA for these batched orders, we are saying PSLA plus 10. The NPS and customer chain exercises are going on and we expect the results in some few weeks when we get it we can share with the community. We already have a second version of this model which have better results and we are expecting that this can give us a jump of 10% additionally. We will deploy this in one or two weeks next. So that's all from my side. I'm open for questions. I think no one sleeps. How do you gather the data on like restaurant delivery and food prep behavior? So item wise. So writer is not item wise. It's on the order wise. So we get the items as a feature for our order and we get the preparation time for the order. So we have the in-house machine learning algorithms that predict these preparation time. We already have the data, right? So we have the prep signal, mark for ready from the restaurant side. So we know that how much time it took from the history. Now we can use this data to predict that what will happen in the future. So we told to the restaurants that you are and the delivery executive that whenever you are picking up the order, you have to market. This data is going to add into our database and then we are using it. So for this new restaurant, we have the cuisine properties like so for this restaurant, it will fall into one of the cluster and from that we can identify it. They want this for you. You have those 30 minutes and free orders, right? 30 minutes of free orders. So I ordered once. I felt like it came in like 31, 32 minutes. And I got a notification, your order was delivered within the 30 minutes. So next time I timed it. It's on the reach, not on the delivery. What is that? So say if you are living in a complex, then the first entry gate. But I'm living in an independent house. Then it's still with you. And next time I started a stopwatch, it came in 31 minutes and it was still not free. We can look into your issue. Actually, I'm at the other end of the spectrum. So I ordered a 30 million delivery once and I got it free both times. We'll also look into you. We are mentioning about giving degraded SLA, slightly degraded SLA for places where you kind of have to batch the orders. You told me that you did not see any drop off in terms of the customers. And what's the drop? Okay, the slight drop. But did you try incentivizing those customers to probably... We are thinking about this experiment that if we know that this order can be batched in future, we will give some incentive that some discount. You haven't tried it out yet. We are doing this experiment. It's on the POC. Hi, one question here. You were talking about increasing the promise time to your customers for orders that you are planning to batch. Have you considered the possibility of offering it as a new service offering for customers, wherein I would... For orders that you're willing to wait longer, I would leave a lesser delivery charge, no surge and so on. So that exercise also we are doing it. In couple of months, you will see that we will get some incentives to the customer who was ready for say one hour delivery or something like that. Yeah, hi. One more question. A key component of your SLA also involves the restaurant keeping their part of the deal that they also meet their SLA. But they might be on multiple platforms. They might have their own customers who just walked into the restaurant. And if the number of chefs, for example, for that day is less, so how do you account for that? Operations guys keep on discussing with these restaurants to tell them that if you get the signal, you should admire the prep time which we are proposing to you. So we already discussed that Mark could try to do some kind of thing. So we actually penalized the restaurants which didn't follow our instructions. On the listing side you can see. Hi. So I wanted to ask a question about the machine learning aspect of your model. So you talked about the model parameters. I wanted to ask what kind of models did you try and how did you arrive to the current model that this is the best that works for you? So we experimented with random forest boosting algorithm, deep learning model. As we have a lot of data, we have collected for say around three months. Deep learning model was working fine with a good set of parameters. The parameters also we went in with different set of parameters.