 Up front I would like to say I'll not be talking about something that you wouldn't know already. Probably what the short talk would do is trying to kind of bring together everything into a framework. What we call system thinking to how data scientists need to think about business problems in some way. So as a brief introduction I work for OLA. I work in the OLA data science team. And OLA doesn't need an introduction. I think at least some of you would have come here taking an OLA. So at the scale that OLA works in hundreds of cities now internationally as well, enabling millions of rights, it's kind of very important to make the right decisions to give a seamless experience to the customer as well as kind of consistent earnings to the driver. So unless data science comes into picture to make efficient decisions at that scale almost becomes impossible. So with that, let me start with what a popular understanding of data science is for most people. It's about predictions. It's about deep learning. It's about human like abilities of machines. But if you think of what data science for a business is, a business like OLA, it's much beyond predictions. Maybe predictions is just a part of this entire bigger system. So just think if we have a cool model that can predict whether it rains tomorrow or not. Fine. What do you do with it? Unless you act upon that prediction in some way, just mere predictions will not enable a business to thrive. So let's start with something that's called a drivetrain approach that was outlined back in 2012 by Jeremy Howard. I think that gives a very good start to how data scientists need to start thinking about business. So every business, there are two kinds of businesses. There are lots of companies collecting data over the years. Now that hardware computing everything has become cheaper, it's easier to collect data. And the two kinds of companies are one which start with saying, I have collected a lot of data. What do I do with it now? How can I leverage this data for my business? There are other kinds of companies who start the other way saying, there are these sets of problems I need to solve, these sets of decisions I need to take. I'm anyways taking them now. Can I leverage the data to take the same decisions in a better manner? Nothing wrong or right about the two approaches. Just that in the first approach, it's like you're in a jungle and you're on a safari without a map. You're just exploring it to see if you get to site something in another. It is like you exactly know where you want to get and you have a map with the destination and you're just traveling through it. Right. So what the second approach kind of enables is the pace at which data science can help a business because there is so much of hype on what AI ML data science can do that because of the hype. It also can be very fast, especially for fast growing businesses. You dream that data science or ML can do a lot of things for you. And if you don't do it fast enough, then maybe it all looks like just hype. Right. So maybe the second approach kind of helps in first defining an objective as to what is the goal that the business is trying to achieve. And then coming to the aspect that what are the levers under my control or the decisions I can make today to achieve that goal. Now here the lever could be many. It's not just one particular lever that you would exercise here. But what are the different levers or different decisions that I can make to achieve the same goal? And then you come to the data to say for these levers, do I have the data that can enable me to exercise these levers and finally comes the models. Right. So as data scientists, we always start thinking about models first, but actually model is the last mile in this entire journey where you walk backwards from the goal and finally arrive at the model that you want to build. So let's take examples here. Probably becomes clearer. So if we take example of Google, what do you think the goal was when Google started? Anyone? Yeah, that's their mission statement. But practically, if you speak, what did they try achieving for Google users? Maybe just enable the users to get the information that they are searching for. Right. And what are the levers that Google had to actually enable this? What could they control? They can control that like user enters something what they could probably control is the order in which they display something to you. Right. That's one lever that they could control in getting the information. Now the data that they had for that is all the different websites that are out there. And all the hyperlinks between them and then came the page rank model. Right. So that's kind of an example of what we spoke about in the previous slide. Let's take another one. Facebook, all of us use it. Some of us would be doing that even in the sessions. If we think about it the same way, what Facebook would have started off with is just keeping the users engaged on the app. Right. That's kind of the goal. Now, if we think of what is the levers, it's about what you show to the users, the content you're showing the news feed or your friend suggestions. That's kind of what you are able to control. And the data that you have is basically how users are interacting with your system, all the activity. And then the model comes into picture where you're trying to build, you know, detect communities or build recommender systems and stuff like that. Right. So any business, like most business problems, especially like marketplaces who are aggregators, they are dealing with this is a complex statement. They are dealing with multi agent resource constrained multi objective optimization problems with multiple levers. Right. Probably this becomes clearer when we talk of examples of challenges in my world at Ola. Right. So why is it challenging? So if we think about several other aggregators who are similar, like marketplaces, two sided, right. The challenge with right sharing is that like, for example, if you're searching something then in information retrieval system, the URLs don't disappear after one person searches. Right. But when you're sharing right, the cab disappears after one person gets the booking. Right. It's, there's no infinite supply from that perspective. If we talk about retail marketplace, right, you're ordering products. The products don't deny a customer saying I don't get delivered to this customer right or products don't rate customers. Whereas in this case, even the drivers are rating customers and customers are rating drivers. It's a two sided human agents that are involved. If you think of a healthcare marketplace after appointments, doctors don't move from place to place. They are at one location. Whereas here the cabs are moving from location to location every time you're allocating it. No, that that's just to give a flavor of it. But if we think of what this necessitates data scientists to do, it's what is the topic of the talk today, which is evolving from model thinking to system thinking. Because as data scientists, we are all used to model thinking where we think of model metrics like am I maximizing the likelihood here? What's the error of my model accuracy precision recall curve? We look at many such things. Right. But when we think of what these models are there in place to do for a business, we need to evolve into what we call system thinking. Where it's not just one model, but many models coming together to solve one optimization problem for the business. Now that optimization problem has multiple agents involved, which means you need to cater to multiple objectives and you are also enabling multiple levers to do that. And how each of this is important in different contexts. Like let me give you an example. For example, if you are booking a cab, all of us want lowest ETA, right? We want the cap to be right here any moment. So maybe there is a cab standing just next to us. If I allocate it, maybe the customer will be very happy, right? But there's also the other angle to it that maybe this driver has just finished a booking and there's a driver, maybe 10 or 15 minutes away who has been idle for half an hour. So as a platform, which is two sided, it's also important to the cater to the objective that they need to be the drivers need to be utilized all the time. Right. So that's just an example of how the even the single optimization that we are solving becomes difficult in terms of quantifying each of these. So as as a system, if we think of it, we need a model to predict customer behavior as to how he responds to the experience. We also need a model to predict the driver behavior as to how he responds to the idle time. Right. And the two together are coming together to make the decision as to which driver I need to kind of match this customer. Right. So let's try to put all the different models that we need to build for a right sharing business into objectives and walk this path backwards. What we spoke about, right? As a business, the objectives are very clear. You need to be a profitable business. But for that profit to be sustainable, we need to enable good customer experience and good driver experience. Right. If these are the objective, what are the levers as a platform you would have any guesses what is in our control that we can change. Pricing. Yes. Yes. Availability. Availability in the sense which car we can use for what? Yeah. Okay. Yes. Allotment or assigning it to cars. Routing. Right. So these are the multiple levers that we have where pricing, what offers you give to customers, what incentives you give to drivers. How do you route the cabs? How do you allot and how do you maintain a mix of your different kinds of like set on and hatchback cars? Right. So all these are under control and what exactly is the data that we get? Yes, location data mostly from the customer apps, driver apps, some third party data location. Yeah, that's also kind of coming from the customer app. So these are the different channels from which kind of acquire the data. Now if we think of models, there are broadly kind of, like if you look at most businesses, there are broadly four kind of models that you build. One is behavior prediction, like how my customers behave or my drivers behave. The second is volume prediction, like how many total customers do you expect today or how many drivers will turn up for work today. Right. And if we look at the stack of models that we need to build, customer behavior, driver behavior, volume of demand, volume of supply. And also there is an aspect of location here. Like if we think of locations, maybe it's just a latitude longitude, but the intent of the ride is very important. Like, let me give you an example of a spot in Indranagar where the Indranagar metro station is there. Right. So there is also a tight there. Probably when somebody is getting down from the cab there on a Friday evening, they're going to the tight Monday morning, they're going to the metro station. Right. And every day if someone is traveling, they're probably, you know, behind the seat, somebody might have their house as well. Right. So how do you exactly kind of tag a location to get the intent of travel? Because only if you get the intent, you'll be able to kind of predict the volume of whether you can expect more demand from there or not. So the type of location is very important. And we need to have models to kind of tag these location types and simulation, because there are so many second order, third order effects. If we take the right sharing example, that how this one particular decision we are making here kind of translates to many multiple things through the day. For example, now if I have two customers and only one car and I send this car to let's say airport and there are no flights arriving for the next three hours. Compared to I send this person to let's say majestic. I know that this person will get another booking from there immediately. Right. So this one decision I'm making now has multiple second and third order effects, which just looking at one decision, I cannot make out anything. So we need simulation models to actually track the network effects. Right. And all these kind of come together where we have a list of, you know, business metrics that we track. Like what is my cancellation rate? What is the utilization of the drivers? And from this is what the feedback loop kind of comes into the model to say which of these need to be tuned to actually, you know, drive these metrics in the right direction. It's very much possible that there is higher in two models, but just because they are in the opposite directions, they are somehow able to achieve the same business metric. And actually if I own only the model, I'll try to improve it and impact the business metric negatively. Right. So there are multiple examples like this across different domains. If we think of, for example, aviation, I know I have only five minutes, 10 minutes, five minutes. So I'll move quickly through this. So if we look at aviation domain, it kind of fits into a very well into the same kind of objectives and models. There also the levers are pretty much the same. You price, you give offers, you have your crew whose schedule you can control, you have your routing. And the data is very much the same. It's coming from your websites. It's coming from the travel sites. And the models you build there as well as your building customer model as to whether they are converting on the prices. You're building crew behavior models, whether they accept a particular schedule and you know how much rest they need, whether they'll respond properly. If you give them a roster and demand volume and your aircraft maintenance schedule, how do you need to route aircraft? So you need a lot of prediction on the delay as to like, because of the weather, how much delay I expect in this airport, which will have again second and third order effects and probably disrupt my schedule in an airport very far away from here. Right. So it kind of maps or any problem that you take across businesses kind of fit into this framework of system thinking very well. One last example on e-commerce. I think by now I think pretty much all of you only can fill in the blanks here. If you think about it, the objectives are very similar. What you control is very similar in terms of prices, your inventory, your delivery, like how you're managing your logistics to deliver the product. For them, the models that again come back to like volume prediction, customer behavior prediction and network simulation to see what are the effects of, you know, something getting disrupted in the delivery logistics. So the main driving point I would like to drive here is for any business, we have now got a general view of the system of models that come together. Mostly the models are behavior prediction models, which are most of our classifier models. There are models which kind of and to predict this behavior, you need to group them in some way because there is so much heterogeneity. You need to identify some homogeneous groups, which means a lot of clustering segmentation kind of come into picture. You need to do the volume prediction for any business to manage either your inventory, supply or cost, which means volume prediction. Something that I didn't touch upon is a lot of things go wrong, especially with sharp minds that we have in India. So a lot of fraud detection models have a lot of anomaly detection going on. And in e-commerce, especially recommendation, whenever you have millions of products, recommendation is an important factor for it. And simulation definitely to even before going live, most businesses have to see how it will do with all the network effects coming into picture. So as a summary, what I would like to say is data science for any business is not a collection of 30 independent models optimizing their own metrics. Then they'll become like the blind man and the elephant. What it needs to put together, it's kind of solving a single optimization problem with multiple levers that are kind of jointly getting impacted and often at loggerheads with each other. So that's the main point that I wanted to try. So any questions? This is a failure because of system thinking, not because of data science because as a data scientist, I would have looked up to several data points, which might have, you know, comparison with my competitor and it would have suggested me a lower price or equivalent price, not a higher price. But because of Ola system thinking, I have to do search pricing and I have to increase the prices. I would say it's not fault of system thinking because competitor is also a part of your system. But when I'm saying a system, it's a set of connected things interacting with each other and your competitors never out of that system. So a lot of times people struggle because you don't have competitor data to bring it into the system, but it's not because of system thinking. But you have connected data science with system thinking and there is a model which is failing, might be not data science, but what else? Here the model is failing because you don't have the data for the part of the system that it's interacting with. But my objective is to get the data. So I should have got the data and then only incorporated. Yeah, but I, if you look at the data sources that Alice, there are third party there, but a lot of third party if they don't cover your competitor system, you don't have that part of your system to make decisions on. It's not because of system thinking. Okay. So in an ever growing model that your business has, how do you decide on future extraction, like for example, on a given day as you keep adding more data sources to the team. It's important to select the right features. So do you follow any process or what are your thoughts about how do you go about future extraction and future selection? Right. So some aspect of it, I think today's talk on by go Jack was very relevant as well, where every signal that you're able to get live from the ground, right? Whether it's driver location, whether it's customer app data, all this data coming into your system is adding more and more signals. Now for solving a particular part of the problem, whether you use that signal or not is pretty much a data scientist discretion to use that feature or not. And I would say a lot of it comes from directly talking to the business operations and the product teams because they know certain things that you'll probably put in two days and come up with an insight which they would say I knew this if you had asked me two days ago, I would have told you right. So a lot of feature engineering actually comes directly from the on-ground team to say what's important for a problem.