 So good afternoon, everybody. So my name is Swami Nathini. I'm here representing the data sciences group by Hola Caps. So again, it's a day to be here today in this first level, but we are confident that it's happening in India. And I think we'll be meeting all of you. So I'd like to thank the sponsors and the organizers for this. So to, so the purpose of this talk is to basically give you an idea of the problems that the data sciences group at Hola Caps is worth a lot. To sort of tell you about the techniques that we use, the algorithms, and the high-level products. And to sort of try and see if you're able to handle your interests. So this is a really short presentation. So I'm going to be touching upon as many areas as possible in this time. And if any of you would like to have a deeper engagement or a deeper discussion on any of these problems, please do reach out to me after the presentation. We also have a few data scientists from Hola Caps who are here today as well. So we have a similar group, I'm sorry. So please do reach out to them or to me in case you want to have any deeper discussions. So Hola is today one of the largest B2C service providers in the country. But it doesn't require too much of an introduction in that sense. But I still wanted to highlight a couple of points about Hola, which I think not many people would be aware of. One is the scale at which we operate. So we have a fleet that is of close to a million vehicles today. So these are vehicles, you know, across from the technical vehicles that we have. They put compact cars, sedans, luxury vehicles, auto-hours, and you know, buses and shuttles as well. So and the reason we operate such a large fleet is to essentially cater to an audience of several million active users who use our service on a recurring basis. And we have a presence in all of the tier one cities in the country and even most of the tier two cities. So we operate across a length of the country. And so that the scale with it brings, you know, opportunities for optimization, which are like a hundred. So the data thanks group at Hola is sort of interested that optimizing this business and to try and make it more efficient for the customers and for our regular partners. So that is what the second thing that I wanted to highlight is the nature of the demands that we need here. So especially viewed from a perspective of, you know, operability. There's a wide range of, you know, demand that is available. So, for instance, today our default taxi, all our main servers is priced at 10 rupees per kilometer price point, which is a very price in most of the tier one and even tier two cities. And this is something that is affordable for a common man in the city of India. Whereas, you know, if you go slightly, you know, towards the tier two cities or even smaller towns, all their 10 rupees per kilometer is not an affordable price point. So over there we have, we're looking at services like Hola Share, which would essentially allow the user to programmatically share his ride with, you know, with a friend or someone that they know and which would essentially half the price, to five rupees per kilometer price point. And at that point, you have, you know, more audience coming in. And then you go even lower, you look at people who essentially use buses for their daily commutes and they travel at a price point of about one to two rupees per kilometer. So Halaem, we have the service known as the world shuttle, which we are just launching. And, you know, over here, if you have a bus which can seat 40 people, this room is fully occupied, and you go, even if each person pays a rupee per kilometer, that's like one rupee, which is enough to sustainably, you know, run the bus. And so the reason I wanted to make the point is that, you know, at this point, we're only scraping the surface of, you know, the demand for transportation in the country. So as we, every time we launch a new service, no matter in what, you know, small, no matter how small the start is, like the shuttle and the share is available yesterday, we find that it immediately unlocks a new set of people, a new pool of demand, which was not visible earlier. So these people were not engaging with us when we were offering our service at a 10 rupees per kilometer price point. And suddenly they become visible. And, you know, the spectrum is bottom heavy, which means that, you know, we are still to, you know, reach out to, you know, most of the people in India. So that is one of the missions behind, you know, operating in this logistics sector, you know, KOLOLA. And we also have other business lines. We have Ola Money, which is a currency, which allows cross-selling between the different products that we have. We have Ola Cafe, which is a food delivery service, Ola Store, which is a groceries and home needs delivery service and so on. So what is the data that we generate? So every taxi or every vehicle that we have on the road is essentially a source of data for us. So every taxi sends a beacon to one of our servers every once every few seconds. And, you know, it shares a wide range of information, including, you know, actual road meter readings, auto meter readings, GPS location, status codes and so on and so forth. So it helps us assess, you know, for instance, if you want to assess the level of prevailing traffic in a road at any point in time, even our vehicles and the signals coming from vehicles will be able to model the traffic at that time on that particular road. And we also will be able to, like, sort of derive relationships and understand traffic flow better. So we know that, you know, at the point A, or if a marked lane is going to get clogged at 5 p.m., then it's some portion of the traffic lane. 25% of the traffic will go down the road and clog it at 6 p.m. Because it would not be able to stand even one-fourth of the traffic that marked the lane. So these are some traffic relationships which help us forecast traffic. And with this, we'll be able to make our service more efficient in terms of giving better ETAs and alternate routes to be faster. So that is the main source of data. Apart from it, we also have customer data in terms of who are using the RAF, the right ETAs and so on. So of course, then we give utmost importance to user privacy and all this data is used only for improving the efficiency of the business for the customer and for no other purpose. We also have data from other business lines, like I mentioned, which would help us understand more about the deeper part of it. So I just thought I'll give an overview of the kind of problems that we have faced here. Pricing is a very important problem for this, like I mentioned earlier. And also there are different flavors of pricing. There is search pricing, which you would have noticed if you've used all of them recently. Which basically increases the price to liberate the mismatch between demand and supply. So if you have, it works more on the lines of how agile and pricing works. So when you have more demand and less of supply, you sort of search the price automatically. So apart from search pricing, pricing itself is a fairly deep area in the sense that you could price the service at 10 rupees per kilometer and allow it to search up to two X. So that will give you a max price of 20 rupees. Whereas an alternate philosophy is to set the base price at five rupees per kilometer and allow a search of up to two X. So these two philosophies will have a variety of different reactions in the audience. For instance, a lot of philosophy will ensure that your demand gets smoothened out across time. You mean that there will be a lot of people who would wait for the search price to drop and then take the rate at a much lesser price point. And with that, you'll be able to influence the traffic flow in the city, to some extent. Or even at this point, the scale that we are at will be able to ensure better time of arrivals, better faster and suffer arrivals for users who travel there. So pricing is a prominent area. Forecasting is another area wherein we like to know what is going to happen in the future. And an example I gave about traffic, our relationship between traffic is a very important area that needs to be forecasted. We also forecast demand from users so that we can properly outsource supply. And we also forecast supply in order to see if we are able to ensure that we meet the needs of demand. So, essentially, we are running a marketplace where demand is represented by the customers who want to use our service. And supply is the drivers that we work with. So, any scheme or any marketplace, and we are the business, we are the manager of the marketplace. So any marketplace initiative you would know would impact both supply and demand, typically in opposite directions. So we'll have to ensure that we hit a sweet spot between supply and demand. For instance, if you increase the price a lot, then supply would be bent and drivers would be happy but users would be turned off. The same time going in the other direction would result in the opposite reaction. So it's important to extract the sweet spot of every initiative. And also, given a bunch of initiatives, given that you have a new pricing strategy, given that you have a new iron sourcing strategy, given that you have a new incentive strategy, and how to ensure that all these strategies sort of work together and ensure that the balance in the marketplace is not dead. The moment the balance in the marketplace is tilted, then one side would deflect to high competition. And the moment you start losing supply, the demand will close. So the managing this tilt is a very important part of the problem which we work on. And forecasting sort of helps us because it uses the view into the future and helps us to foresee what's going to happen. ETA at every time, and ETA is a very sensitive, ETA stands for expected estimated time of arrival. So whenever you book a taxi on the whole app, it's going to show you a time in minutes that the driver is going to take to the issue. But this is a very sensitive metric because we know that the moment this ETA promise is violated, our cancellation rates peak up. So the moment you cross that ETA time, probably your cancellation, it just shoots up exponentially. So it's a very business-sensitometric in that, in the sense that you'll be able to satisfy your demand better, you'll be able to manage expectations better, you'll be able to predict ETA correctly. And predict ETA and requires understanding of traffic, which I talked about earlier. And also guarantee the arrival time. So when you have a sharing service, for instance, when you want to share your ride programmatically with someone else, then it's important that your time at which you would reach the destination is sort of, there's a bound of it. When you cannot be reflected, your food cannot be reflected too much to ensure the other person gets it done. So guaranteeing on arrival time is also another area for which you require ETA prediction. Destination prediction is important in the sense that today we don't ask for the user's destination, except in the case of very specific destination points like airports, you don't really ask the user where he wants to travel. The reason for that is because we want to keep the service as unbiased as possible. We don't want to bias in favor of people who are going to travel longer distances or whatever. So it's irrespective of the destination, you will get the same quality of service. And so destination prediction is important because it allows us to foresee where the cab is going to end up. And potentially the cab can take a booking even before it actually ends up there. So estimating the supply pipeline helps us ensure that we are able to actually show the real supply that is available to the user. So some of the other areas that we work on, programmatic incentives, incentives that are given out to drivers, incentives that are given out to drivers in order to ensure that they stay on the platform. It's often programmatic. We have more work on the driver side. We have predicting attrition, for instance, so of drivers as well as users. So once you figure out that a driver's usage pattern sort of became that he's no longer as active as he used to be on a network, then the moment you immediately try and give him a better package or a better sort of incentives so that he again resumes working for us. Again, the same thing applies to users as well. So it's important to ensure that when the user is slipping out of using that kind of set of incentives to make sure that he comes back on board. Market-based optimization is something that we talk about. We have to show some critical metrics which are sort of set forth by the business, EPA being one of them, and version rate on search pricing being another. So we have these critical metrics. It's important that when we do all of this in the spectrum of projects, it's important to ensure that these critical metrics are preserved. So each one could pull the metric in some direction. It's important to have the right balance between supply-centric initiative and demand-centric initiatives to ensure that the balance is maintained. And that is another, that sort of handle in an automated fashion, so I suppose. Driver rating, so as I said, we get a lot of data from the taxis that we have on the ropes. So we get data about from the driver's accelerometer, so which sort of helps us to understand his driving profile, how risk-taking he is and so on. And also, every time you complete a ride, you rate the driver. The driver rates you as well. So in order to do a driver rating system, the straightforward way of doing it is to ensure that drivers who get rated high by customers who are rated high are actually good drivers. So that's sort of feeling into doing something like page-riding on the ratings program. Law is different than else, I think I talked about. So you sort of tie in the driver. So today we have initiatives where we source cars for some of the drivers. So a lot of people, I mean, so this is actually, so drivers are pretty interesting because it's a new section or new classification of employment that we've created. So these are actually self-made entrepreneurs who are not strictly tied to any service present. And they're able to sustain themselves by getting access to demand in a very problematic and predictable way. So that, you know, earlier the driver ecosystem was not properly managed in the country. So, you know, in one month, you know, when there's a tourism season, you know, you're running to peak and then the next month it will just take off. So people are not really sure if, you know, having been a driver is a good long-term occupation. That is, you know, by sort of programmatically providing them a channel and source of demand, we've sort of ensured that our driver ecosystem is right there. Yeah, so user intent prediction is another thing to brand understand as much as possible about the user. The moment there is, we realize that there is a need for people to brand not Jim to write this, this also. So, I mean, very quickly on some of the models, I mean, we use, on regression we use GLM, so there are a lot of regression activities like predicting what the conversion rate would be at a certain search price in a certain location, at a certain time and so on. So this pricing is very sensitive to location and so it's true to the category of service that a relationship between, across categories. So when you price one category at 1.5X of the other, you know, there's a certain relationship, there's a certain flow. Whereas if you sort of bridge it, the flow would be better. So understanding what is optimal price point given a category, given a time and given a location is very important and that's one of the most important regression problems that we welcome. We have other problems as well like regressing ADA, regressing men, there's a regressing probability of demand for a given user that's possible. So we have a combination of, I mean, more categorical and numerical data which we use to train these models. So we sort of de-couple the categorical attributes and sort of use tree-based models like Canadian booster trees, random forest in order to make the categorical attributes. And then once you cluster or once you work on the data using your trees, on each of the resulting ensuring partition, you apply your numerical techniques exactly as well. Time series modeling is important for forecasting. So we have time series models like Arima. We also look at the Fourier coefficients and try to predict into the future. Large dimensionality handling is, again, taking care of a standard technique like PCA or regression. And there's a, as I said, every taxi being a data source is a lot of data that gets collected. And hence, most of the algorithms, of course all of them run on distributed systems. So we use distributed systems like OOFIG which are handled a lot and also Spark which is sort of something that's picking up. So one of the reasons why we are here is to try and see how Julia will be able to help us in these initiatives. And so for guys learning, of course, given that we have categorical data, we try to use our tree models to do clustering and we also do mintry of two k-means. And so on and so forth. So for the graph problem, to understand model the nature of traffic, the nature of traffic efficiency, we have a large distributed graph which is powered by Apaches Giras, which is an open source, large graph processing system, which we use to have, maybe in graphs with tens of millions of nodes, each node represents some point of significance. And the, and the sort of train IEK models on the office. Also we have constraint optimizations even more important when we manage the marketplace, because they're a business-sensitive matrix which are to be sold. As I said, you know, when we have these initiatives and experiments happening, how do you ensure the marketplace balance is not level? So that is the other thing. I think I'm basically done in a minute. I came with the material even though it's going to be short-talked. So do you have any questions? So the thing that you mentioned about is the problem that you told about is a classic perishable inventory problem wherein you want to forecast the demand on top of that. If the demand is greater than the supply, you want to optimize that demand. So in the last slide, the last block that you mentioned about constraint optimization, is that what you were referring to then? I mean, is that what you're using for optimization? So that is one of the things. So we have, ultimately, it's a constraint optimization problem. Everything is a constraint optimization problem. But the individual constituents of this, the objective function and the constraints have to be predicted because they are not already more. So it's important that we train models. For example, if you want to forecast or if you want to predict what the conversion rate would be if I surge the price at certain factor. So that is something that could have never been tried before. So it's important to predict that. So constraint optimization with individual constituents plugged in via prediction. That's the way it works. So just to add on to that one. So essentially a task-added model where one set of things have been as a subset of problems, which finally funnel into your LP, which gets solved and then you make sure that you can treat it out. So it's a multi-level cascaded problem, in some sense. At the bottom, you have the constraint optimization. You have models, cascade of models. Just one last question. LP, are you distributing that at all? Or just one more. LP, we're not distributing. How large is the LP? How large is the LP? So we have a number of constraints running it about two tenths to one tenths. Did you arrive at the price of 10 rupees by optimization? It was initially arrived at by trial and error. So right now we are doing some amount of work here. So we cannot optimize it. And we cannot have a price like 8.7. So we'll have to ensure that our prices are strengthened a lot. But there's some amount of experimentation at the end. So that's the reason why you find that across cities, the price varies a lot. That will be the highest level of optimization. So related to that question, you said trial and error. Are you doing systematic experiment design? Yes. So I mean, when the company was started, I mean, of course it was trial and error. There was no data sciences group at that point. But right now, yes, I mean, there's some systematic, some levels of which we can set price. Of course, I mean, business sustainability is important. But given restrictions, there are some levels which we play with. And it gets more complicated because pricing one category at a certain point would have an impact on the other category. For example, if you make your compact cars very cheap and the sedan driver should be upset, and they are different. So it's important to ensure that this is this sort of... That's the way I was going. It's actually a machine that is running and the side effects and also the... It's not just restricted to itself. It's restricted to our other business lines. It's restricted to auto rickshaws, for instance. If you price the mini too low, then for people who use auto rickshaws, they're different. At the same time, people who are using sedans are also restricted to it, so both sides are different. So it's important to get the price spectrum right across the range of services. And also take care of surge prices. So when mini surges, it could approach the price of a sedan. So over there, what is the market reaction? So interestingly, people are still not reflecting to sedan. That's something that we know. Even when we... When the mini is actually... After I apply that multiplier of whatever, one point by whatever, whatever, if it exceeds the price of a sedan, people are still booking the mini. And if they don't find a mini, they just go away. So I hope any charges... Thank you. So the activity is that the user does only from the OLA app and record it. So we add anything that you can add. Now, when do you open the app? When do you want to do it? When do you use other online services? These are symbols of the individual. You have a question? Any other questions? Do you get to know when I open Uber and then you take the Google app? LAUGHTER You've got this big icon on your hand, I can't say it, but we know that. LAUGHTER Yes. LAUGHTER Thank you. All right, thank you. So please do talk to me if you want to have any deeper discussion. I'll go to the team, which is sitting. APPLAUSE