 All right, good afternoon everyone. It is a pleasure to be here. What I wanted to talk about today was a topic that we take for granted, but never talk about explicitly which is what is the real skill in data science, right? So, we talk about all kinds of skills in data science and in my mind after doing 20 years of data science, I realized that formulation is the key skill in data science. So, I am going to talk about what that is and, you know, give you a lot of examples of formulation thinking and hopefully you will take a lot of take away. But before I start, somebody told me, you know, the talk after lunch is okay, but the talk after that talk is the hardest, yeah? So, I said okay, you know, if you do not get anything, it is not my problem, yeah? All right. So, let me talk about, you know, how technology is evolving very quickly in all dimensions, right? Tools are evolving all the way from, you know, bows and arrows to missiles to telescopes, sensors are evolving, all kinds of telescopes, computing is evolving, data is evolving, the idea of data is evolving, interfaces are evolving, right? From UI systems to voice based systems and the idea of intelligence is also evolving, right? So, all these things are evolving in front of us and if you are into tools, let us say robotics, if you are into sensor technology, if you are into computing technology, if you are into AI, you have seen this evolution, you know, going exponentially and as a result, the idea of a product is also evolving, right? Today, people are talking about autonomous vehicles or smart cities. So, the idea of what is a product is also evolving. The idea of AI thinking is evolving and, you know, in my previous set of talks, I used to talk about what is that and now what I am going to talk about is the idea of data science skills is also evolving, right? What you thought is a good data scientist five years ago and what it is now is a very different notion and I will talk about that, right? And the reason is there is a lot of auto ML that has already happened now, right? So, we do not need to know a lot of things, but our skill level has to go to the next level, right? In a way, AI is not just going to take away the jobs of other people, it is also going to take away our jobs, yeah? So, we need to start thinking about upskilling ourselves and what does that mean for a data scientist is what I want to highlight, right? So, if you think about our own skills, right? You say, oh, I want to be a data scientist, what do I do? So, you acquire all these skills, right? So, you learn about different algorithms, you acquire a lot of domain knowledge in different domains, you do a lot of big data computing, Spark and Hadoop and all of that and then reluctantly you say, okay, I will also learn math, right? And then we have a bunch of math that we learn, right? And then what happens is if you think about all these four skills, they really, you know, boil down to one core skill, which is the skill of formulation, the art of formulation. And that's what is the soul of what a data scientist does, everything else is peripheral. So, that's kind of the key message and I'm going to harp on that message using help from other people and my own experience that really formulation is the main thing. If you know how to formulate, the rest of it is just writing a, you know, a one line function somewhere, right? And that's it, right? So, in any science, if you look at how we have formulated the laws of gravity or the laws of quantum physics or the laws of aerodynamics, it's really the people who formulated those formulas, right? And everything else is built on top of it, right? Now we can fly go to the moon because somebody formulated the basic laws, right? And that's what we need to start thinking. What does that mean? And you know, this is not something we're going to start now. We've been doing it since, I don't know, fourth grade, fifth grade, right? Remember these problems? The algebra problems, word algebra problems, my daughter is struggling with these now and I realize how we used to struggle with these. And you know, if you're not good at this, you should not be here, yeah? That's, that's kind of the transition. And you know, this is our first journey into formulation and we said, look, here is a problem statement and here is how you formulate, right? You come up with variables, you come up with relationship between variables. And then you say, this is what you're really saying, what you're saying in English, this is what you're really saying in math. And once you know this, you, you don't have to worry about it. There are solvers to solve this, right? So, our real journey is from there to there. And how does this translate from eighth grade to where we are now is the following, right? So, tomorrow your boss will not say, hey, build a clustering model for all my customers. Or he will not say, predict churn. He's not going to tell you do a demand forecasting. What he's going to tell you is, hey, which market should I grow my business next, right? Or why am I losing all my customers? Tell me what to do about it, right? How do I improve my operational efficiency and reduce the cost of operations, right? Which is the best price for this product in this store at this time? How do I get maximum ROI on my budget, right? So, so these are the, the questions the boss is really asking you, right? These are what we call business problems. Pretty much like the algebra problems in word algebra problems. And now our job starts from there. And how do we go over backwards from there to say, you know, what kind of a problem is it and what kind of data do I use and things like that, right? So, so we understand this idea of formulation starts from a statement of a business problem. And then we start to come in and that layer is what I see missing. I teach a lot of data science people, I give a lot of talks. A lot of people know a lot of tools, right? A lot of people have a good domain understanding. But these two people don't talk to each other. And this layer of people who can actually formulate a business problem into a set of machine learning problem is what is missing. At that scale, I think we need to start developing, yeah? So, machine learning doesn't start with Python or R or TensorFlow and all of those good stuff. It really starts here, yeah? And that you realize after three years of doing this other stuff that, hey, somebody has already automated this. I was trying to do this. But really, this is what I needed to learn, yeah? All right. So, one of my favorite movies, Alice in Wonderland, very beautiful quotation there. When Alice is lost, she asks the cat, right? What road should I take? And the cat asks her, well, where do you want to go? And she says, I don't know. And then the cat says, then it doesn't matter. You can take any road and it'll get you there, yeah? And that's precisely what is happening today. People say, you know, ask me, Ling, did I get every day at least one message? I want to be a data scientist. Should I learn Python or R? And my question to them is, did you write this message on a Mac or a Windows machine? You understand, no? Yeah. So, how do I take that guy into this? I'll give a talk and somebody will watch this video or I'll point them to this and say, how do you think about it? So, what I'm going to do is I'm going to talk about different flavors of formulation, yeah? So, formulation as an alkydic intuition to formulation, modeling, baritimes, feature engineering is for formulation. How do you optimize an infrastructure like telecom? How do we formulate that? How do you improve, you know, operations in OLA? How do you formulate that? How do you think about an overall solution and not just individual building blocks? And how do you deal with complex systems? So, I'm going to formulate all of these for you. So, you get different degrees of examples of what is this scale of formulation, yeah? So, really, the most important skill that we have is not intelligence, it's really intuition. There's a very subtle difference between intelligence and intuition and how do you take intuition and then formulate your intuition, yeah? So, let me talk about this and, you know, let me start with a very simple question. If I give you a bunch of numbers and say, what's the mean of those numbers, you'll go back to your sixth grade or seventh grade and write down the formula, right? But there's no such thing as mean of n numbers. There's no such thing as mean of n numbers. What is the intuition first? So, what I say is, well, whatever the mean is, it's actually a number that is close to all the numbers. So, there's no formula for mean. There is only an intuition to start with and then you start with an intuition and then you formulate it, right? So, this is how you start your English statement and then you say, look, this is how I could formulate this. Now, what will happen when you do the first formulation is, it's going to capture the intuition, but it's not going to be mathematically pleasant, yeah? So, it has to be both resonating with the intuition and also resonating with the nice optimization algorithm. So, we do a refinement step and we say, look, the absolute is not differentiable. Let's do a square and then we plug into any QP solver and it will give you. So, the mean of n numbers is not a formula. It's a solution to an optimization problem. Once you understand this whole principle that none of your machine learning algorithms is just a formula, right? What is an eigenvector? What's a Fisher code vector? These are not just n formulas. These are a result of an intuition followed by a formulation, refinement and then optimization, right? Take another one. We take, you know, heads and tails very for granted, but what is the probability of a binary event? You can say number of heads divide by total number of things, but that is not the answer. The real answer is, it is that probability that best explains the sample data. That is the real intuition and then you start to formulate it as a Bernoulli. Again, it captures the intuition, but you can't differentiate this because it's a product of numbers and then you go refine it, take along and then you get your answer, right? So, your answers are not really the answers that you memorize. If we follow this journey to derive XG boost, to derive GANs, to derive SVMs, right? That's what is the skill we need as one of those kind of data scientists, right? How about this? What's the best decision boundary between two classes? Is there a best, right? And then one day, Wapnik comes in and he defines very precisely what is the best decision boundary and then he follows exactly this intuition and he says, look, the best boundary is the widest possible boundary between the classes without violating any points and then he writes down the first version of that formulation and then he says, look, you know, this W in the denominator is not very differentiable and easy. Let me convert this. So, I get a, you know, the primal problem and then he further refines it and he gets a dual problem. And after that, the job is done because the QP solver is going to take care of it and solve the optimization problem, yeah? So, we understand that all machine learning algorithms that have been created have been created through this process. What we have seen is the end of the process implemented. So, we think machine learning is just about using those APIs. It's not. It's about learning this process so you can invent new algorithms someday, yeah? All right. Another thing that we deal with is when you take a bunch of Coursera courses, now you know like 200 different algorithms, right? It's like a carpenter who knows a hammer and an axe and all these things and you say, look, I don't know what you're asking. Your boss is saying, hey, find the best marketing strategy for me and you're wondering, you know, I don't know. I never learned that. I learned clustering and SVM. Why are you asking that? So, our job is to say, look, I think it's a clustering problem. I will take all your customers. I'll group them by what they buy and then I'm going to send them the right coupons. Who is going to do that job is the real data scientist that we need today, right? Writing the K-means clustering has been done 300 times. We don't need to rewrite it again. Yeah. So that kind of thinking, right? Is it an interpretation problem? I want to build a home security camera and I have a company and I hire a data scientist and say, I want to build a home security camera. What is your first job then, right? To recognize people, things in the house, whatever, right? You're building a cell phone thing and you want to know what activities you are doing or you want to build a sentiment system and say, the boss says, I want to know what people are talking about for my drug or my restaurant. He's not saying, I want you to solve a sentiment analysis problem. So you have to do this kind of problem, which is which entity, which aspect, which sentiment is positive or negative. Yeah. Is it an outlier problem? So somebody will say, look, a lot of hacking going on. Yesterday we heard Capital One or something lost a lot of data. A lot of fraud is happening in the world. How do I minimize fraud? So your job is to say, I'm going to formulate this as an outlier detection problem and then I'm going to solve it, right? Similarly, if I say, you know, which customer is about to churn or which part is about to break down, you have to formulate it in a very different way. You have past data. You want to have prediction into so many days ahead. How do you formulate that problem, right? All your credit, modeling, everything is different. So prediction is very different and outlier is very different. Detection is very different. These are all very subtle differences between what you may call classifiers or regression models, right? Is this a community detection problem? Am I finding groups of things, right? And you know, in a lot of things like, for example, retail, you may be finding groups of products that go together, right? And somebody said, hey, improve my sales or cross-sell, but you have to figure out that, oh, I'm supposed to find groups of products that go together. Somebody says, increase my watch time of TV or Netflix or whatever and you have to figure out how to do that, yeah? Another class of problems that is now picking up is reasoning problems, right? How do I think not just about prediction, but also a sequence of steps that will lead to the final answer? How do you do question answering? How do you do math problem solving? How do you do conversations? These are all NLP, not NLP problems. This is a reasoning problem. What about reinforcement learning? Is it a reinforcement learning problem, right? So, we started, for example, we are doing now agriculture and healthcare and we said, you know, there are all these people trying to do, you know, weather prediction or soil IOT or, you know, taking an image of a crop with the hyperspectral data and telling what the crop health is. So, these are these individual classifiers, but what is a real formulation of an agriculture problem? Agriculture is a reinforcement learning problem, right? So, you have a state of the crop, you understand that, you know, what do you need to do now? And based on what you do, the crop is going to go to the next step and then you are going to learn or not learn from that process, right? Healthcare, Fitbit is not just a classification problem, it is a reinforcement learning problem. Education is a reinforcement learning problem. So, there are lot of problems which are of this type that we need to solve and is it a deep learning problem? Not every problem is a deep learning problem, yeah? We all agree? Yeah, thank God, yeah. Okay, because every time I throw up a problem, you think, yeah, yeah, TensorFlow, blah, blah, blah, VGG and everything is done. No, that is not all right. Think a little bit and then, right? So, this is, you know, how do you pick out of the hundreds of different modeling paradigms, the one that really precisely formulates your business problem as a machine learning problem, yeah? All right. So, the next thing we will talk about is feature engineering. Feature engineering is another art that is lost because of deep learning, right? You will say, ah, deep learning will do everything, right? So, when I talk to people, I can actually tell the age of a data scientist by this question, right? If you are a 30 plus or whatever, 35 plus, right? You are one kind of a data scientist who is, you know, who is a different kind. If you are a 30 or below, you are probably this kind of data scientist who is very model centric. For him, model will take care of everything. Let me just throw raw data at it and it will be fine. So, we are not even formulating anything. We just throw an image or a audio signal or a text to text just as it is and let the model take care of everything, right? So, that is one mentality. The other is the, that 35 plus, the 40 plus guys who are still in the old school and saying, let me derive a nice feature, right? And one feature at a time like the same Dungal movie, right? Telvan is not born. He has to be nurtured, right? Per feature, you have to do something. So, each feature is a powerful feature that can then go into a simple model, right? Just to give you an idea, if I want to do this, if you are a brute force data scientist, what will you do? You will throw a damn neural network at it, yeah? And say, I can increase complexity all day long and I will beat the hell out of this data, right? But if you are a lazy guy like me, you will say, look, you know, this looks like a radial function. Why do not I create one feature and just throw a logistic regression in front of it, yeah? So, these are two very different mindsets. Not one is better than the other, but again it is a formulation problem. How do you want to formulate is it this way or that way? And there is a law of conservation. Complexity is either here or there. So, if you spend three months formulating a engineering a feature set or you spend three months coming up with the neural network architecture, you can choose whichever. You will still take three months, yeah? Okay. Another kind of feature engineering is very domain knowledge centric. If I give you these four features and say build a, you know, classifier of fraud versus not fraud, there is no model linear logistic model or a neural network model in the world that can take these four features as is and predict the class fraud or not. There is no way, however deep your neural network is, you cannot do this, yeah? Very simple problem. But if you just derive a very simple set of features and say here is a velocity feature, which is a ratio of this and that, that one feature is so powerful and it is monotonic to the class level, right? So, the power of feature engineering is also a very important art of formulating a overall machine learning solution. All right. So, now let us talk about one of the problems we are working on in telecom, which is a very interesting problem. And you know, how many of you use Geophone or any other phone, right? Okay. We need to do more marketing here. No, that's fine. I'm here for that. No. So, you've seen, you know, call drops or coverage is not there, all kinds of problems, right? And, and so the problem statement is the following. So, you are given a current network, whatever it is, right? You have towers in these places. So, you have a current network and based on the network, you can actually tell which grid in the city has coverage, good coverage, poor coverage, very poor coverage, right? Or no coverage. So, think of this as the whole grid of, you know, a small part of a very large grid. And now you have, let's say, these are 100 meter by 100 meter grids. And for the whole country, you have billions of these grid points. And you can come up with this number and say, look, this is my state of the network, given this is my current network. And the real question we are asked to solve is, tell me, where should I put the next tower, right? The lat long of the tower. What should be the height of the tower? Because the higher the tower, the long further it can go, yeah? What should be the azimuth? Azimuth is the angle of the tower, right? So, there are three antennas. So, you can have three angles. And then what should be the tilt of each antenna, right? So, these are my control variables. And now you have to formulate this as a machine learning problem, yeah? All right. So, let's do a quick exercise. Good data scientist here, right? Let's formulate this. How would you formulate this problem? So, our first attempt was what you are thinking now. See, whenever you look at this 2D space with a bunch of contiguous regions, you say, yeah, it's clustering, right? This is like a clustering problem now, because all I can do is, I can go to each of these clusters of where the grids are, you know, less covered. And I can cluster and these nine guys will get into one cluster, these six will get into one cluster. And I'll take the cluster center and I'll put a tower there. Looks very normal. Does that make sense, right? So, that's one way to formulate. But the problem is when you have 2 billion grids, you don't even know how many clusters are there, right? How many? It's not just going to be five. It's going to be millions, right? So, now we said definitely this is not a clustering problem, yeah? And then we start thinking, you know, how do you formulate this? And then we said, okay, let's take a different approach to it. So, what we said is, look, what happens when you have a set of towers, right? And then you decide that I'm going to put these towers in these places. If I'm going to put these towers in these places at this location, this height, this asimut, this tilt, and this clutter. Clutter means, you know, a tower has a building in front or a mountain in front. It's called clutter, right? So, that topography also matters. Around each potential tower. So, if I put these towers in these locations, I can tell you which grid points they will cover because the line of sight kind of thing, yeah? So, that knowledge I have. So, now I can use that to create a very simple matrix that says, tell me all the potential towers. These are all the potential towers I could put. And now, these are the grid points that need coverage. And now, I'm saying, if I had put a tower in this location at this height, at this asimut, at this tilt, with this clutter, will or will not, this tower, tk, cover the grid point gn, right? It's a very simple coverage matrix. So, now I have a coverage matrix, yeah? Now, what do I do? I say, okay, we'll take this coverage matrix. And now, there is a cost for every tower, right? Each tower, depending on how high, how big, how whatever it is, has a cost. And the real estate cost of getting that land, all of that, maintenance cost. So, there's a cost associated with every potential tower. And now, we said, let's create a decision variable which says, you know, should or should I not put this tower? So, these are binary variables, 0 and 1. And if xk is 1, that means I'm going to put that tower and that location at that tilt, at that asimut and all that combination, yeah? And now, as a result of that, now I will formulate a optimization column which says, look, given my binary decision variables, and given the cost, I want to minimize the cost. And at the same time, I want to make sure that every grid point is covered by at least one tower. And this formulation has nothing to do with machine learning. This comes from computer science. It's called a set covering problem. So, here we are doing a weighted set covering problem. So, we were looking in, you know, data science community, how to solve this problem. And this is not a data science problem. It's a computer science problem, very well thought about for a very long time, but not in this context. So, how do we bridge the gap? Is you start from first principles and you think about it and then say, oh, it's a very simple problem. Now, the only problem is all computer science problems are NP-hard problems, right? That's another problem. So, now we have cloud and genetic algorithms and all the good stuff to solve this problem, yeah? All right. Another problem I'll talk about is very interesting. It's in the operations world which is how do I run Ola as a fleet management system, yeah? So, here the key problem, how many of you took a cab today? Uber, Ola, right? So, see, you know, we are affecting your life, right? If you make a phone call, it's here. If you took a cab, it's here, yeah? All right. So, now, how do we do an allocation? You know, you book a cab and suddenly somebody gets allocated to some cab. There's a magic behind that. I want to share that magic. So, you understand that how did we formulate the allocation problem as a machine learning slash optimization problem, yeah? So, here's a very simple notion. You have a set of cabs on one side. So, I'm going to call them A1, A2, AM, AM, and I have a bunch of customers on one side. And now, I want to know what cab should I allocate to what customer, right? So, it's a, it's a bipartite graph, if you will, right? Now, what I can do is I can estimate the cost or a goodness or a badness, whichever way you want to look at it. I can estimate a cost of if I allocate this cab to this customer, what is the cost? So, if the, if the cab is very far from the customer, the cost is high, right? If the customer is a woman customer and the cab driver has a bad rating, then the cost is very high for that allocation, right? So, I can come up with different notions of what is the cost. And once I give you the cost function, then it becomes a simple problem because now we can formulate this as a optimization problem and say, look, the total cost of allocation has to be minimized and each customer has to get a cab and each cab has to get a customer. And this is a simplistic version which is where the number of cabs and the number of customers is same, yeah? Now, you know, oftentimes these thinker types, they go crazy on, but what if this, but what if that, right? What if there is no cab? What if there is less cab? What if there is more customer? Yes, we will come to it. Solve the simplest case first. That's another thing we need to learn to think in a, you know, layer of hierarchies, right? We can't think of the final with all ifs and buts on day one, right? That's not how we learned languages and other things. So, we start with the simplest formulation where we assume equal number of cabs and customers and then we go solve that problem. So, how do we solve it? Again, what we'll do? We'll take a decision variable that says, let's say that, you know, xmn is 1 if I did allocate this customer to this cab, yeah? Otherwise, it is 0. So, that's my decision variable. And then what will I do? I will now formulate it as a cost reduction problem. I'll say, look, I need to find that x, which is a matrix now of 0s and 1s, such that the overall cost is reduced and every customer gets 1 cab. That means the sum of this over the rows is 1 and the sum of this over the columns is 1, yeah? Very simple. Now, this algorithm is called Hungarian algorithm, which has been solved in computer science and optimization world and now we are applying it in a allocation domain, yeah? So, there is no machine learning. I'll tell you what machine learning is here, but so far there is no machine learning. This is formulation, yeah? So, don't think that just because you know these machine learning techniques, that's all there is. Your repertoire of thinking has to go beyond and therefore, you need to learn optimization algorithms, computer science algorithms and then that becomes a real pool of tools that you have and then you start applying it to these problems, all right? So, now, what is the learning here, right? So, what is the key thing here? And that's another thing a good data scientist can figure out very quickly is that look, you know, this is making sense. In this picture, what is it that is still you're not comfortable with? What is the ambiguity here? Cost. How do you compute the cost is still the ambiguity, right? Should I weigh the distance more? Should I weigh this more? Should I weigh that more, right? And that's the part that we learned. So, we say look, this is the graph. I need to first compute the cost somehow. Once I compute the cost, I know what to do, yeah? So, what we'll do? We'll say, okay, I'm going to come up with a bunch of features. So, I'm going to say is the customer, you know, ETA sensitive or not? If he is, then I'm not going to give him a very far cap, right? Is the driver, you know, wanting to go home at this time or not? If he is that I'm going to only allocate him that customer who is going to go there, right? So, think about all these little things that will go into the system just to compute the cost of that allocation. That is where lot of ambiguity is. Lot of data comes in from past and then we use all of that data and a bunch of parameters that say, should I weigh this function more or this variable more or this variable more? Only to come up with a cost function which will then give me the matrix, which will then give me the allocation, which will then give me the final matrix that happened on that day. Did all the drivers make enough money, right? Did all the customers get minimum ETA? So, these are my final end of day matrix. And now, once I know the end of day matrix, I know what parameters theta 1, theta 2 I use, then I'll do the learning and say, look, that day I applied this theta vector and I got this matrix. Let me now try a different theta vector. Let's see what matrix I get, right? So, understand the learning loop. So, it's a combination of two things, right? A machine learning kind of system or reinforcement learning kind of system, if you will, where this is kind of the control variable and a simple computer science kind of thing which is a Hungarian algorithm, yeah? So, that's how allocation is done. So, next time you take a cab, you know what's going on while it is circulating, yeah? All right. Now, another thing I learned, you know, I was working with companies like, I don't know how we are doing on time. Let me slow down. So, another thing I learned at Ola was, you know, before that I was working in Yahoo and Google and all of these companies and there, you know, we had a collection of models, right? There's a search model, there's an ads model, there's a YouTube recommendation model, there's a, you know, auto complete model in your Gmail, there's a spam model. These are all different models working independently on their own silos. And that's what I thought data science was. You pick a vertical or a silo, you take the data of that silo and then you work on that data and build a model of that silo and you're done. Then you move to the other silo, right? So, this is how we start, bottom up, yeah? Now, when I joined Ola, I realized something very profound which is that if I look at the problem from the top-down approach, the whole is always greater than the sum of its part. If I solve each of the silos independent of each other, will I still solve the global objective function or not, yeah? And that is another level of formulation that we need to start thinking, yeah? Because when I do, you know, spark city optimization or electricity optimization or if I do, you know, connected car optimization, right? I need to start thinking global optimals solutions which cannot be done in silos. So, here what happened was first three months at Ola, you know, I used to talk to all the PMs, right? So, the PM for allocation comes and says, this is the most important problem at Ola, yeah? Solve this, everything else will be happy. Then sharing guy can say, you know, our sharing is not doing well. If you can solve this, our revenue will go up. Pricing guy says, you know, we need to still tinker on the pricing. We are not making enough money. So, each of these guys, right? The guy who deals with drivers, who deals with customers, who makes offers, who does the routing, who moves the cars around, each of these are their own silos and they are all tinkering with their little things and they cannot say that their tinkering led to the final metric improvement. Because the other guy is also tinkering and he doesn't know that that guy is tinkering and the two tinkering are complex enough to each other, we don't know who is tinkering is causing the gain or the loss and that is the biggest problem in data science. If I tell you, here is an input and here is an output, but I don't know why the output is this because it can be from any of these and I don't know the complex interaction, then I am lost, yeah? So, that is a very dangerous problem in AI. So, one day when I was commuting, I had an epiphany and I said, look, Ola is not these 20 problems, yeah? That's not how you can think about a fleet management system, because all these problems are connected to each other. They are not like Google ads and search and spam, they are all connected to each other and then I realized that Ola is actually can be formulated as a single objective function and everybody is looking at the same objective function from their own perspectives, yeah? So, now, once you talk to all these people and you start to look at things top down, that's when you realize what's really going on. So, we formulated Ola is a very beautiful problem which is you have supply, you have demand and all you have to do is you have to make sure that supply equals to demand at all locations and all times, simple, right? So, after three months of no, no, this is good, this is how complex this problem is, this is how complex that problem is, ultimately it boils down to very simple things, right? And it is really matching supply and demand and if you have demand more than supply in a certain area, what will happen? Some customers will not get a cab. If you have supply more than demand in some areas, what will happen? Some cabs are sitting idle and they are wasting money and if I can keep this together through whatever mechanism, through whatever mechanism I can do this. So, if I change pricing, so, what is pricing really, right? So, if you have more supply and you have less demand, what do you do? You try to equalize it by reducing the price. If you have the opposite, you do the same thing. If you have less supply, more demand. So, really what you are solving in pricing is supply equal to demand in a pricing way. What you are doing in allocation is the same thing. What you are doing in, you know, onboarding more drivers on a certain day, if you know that the demand is going to be high, you onboard more drivers on a certain day, right? So, then again you are solving supply equal to demand. So, no matter what you do, you are really solving supply equal to demand. So, after three months of talking to people, I said I have a solution for you, everybody gathered and I just showed this one slide and said that is all you are solving. And then they, no, no, but we are solving pricing, but we are solving this. I said, no, you are all solving this problem, right? And this is the number I want to measure. And once I know this number, so, this number is really saying what is the, you know, how do you quantify the goodness of a location and a time, right? So, 9 a.m. outside of this hotel, what is the value of that for OLA, right? So, that is the value of this context. And you know, if the supply and demand are same, this number is going to be one. So, you are going to maximize the value from that location if supply equal to demand, right? So, that is your simple formula and then we measured the efficiency and then we said, look, now there is only one objective, whatever you do, whatever you do, whatever you do together has to increase this overall efficiency, yeah? So, then it boils down to only three things. How do you quantify the value of a context? How do you predict the demand forecasting of a location and time? And how do you optimize the supply based on what your demand, right? That is it. So, this is again, this is no machine learning. This is just thinking about domain knowledge, gathering domain knowledge and then formulating a top down objective function for the whole system. And by the way, this everybody can use not just OLA and Uber, now food deliveries, swiggies can use this, grocery deliveries, you know, your, what is that grocery guys and medicine guys and online stores and offline store, they are all solving the same problem and this is their only objective function. Now, they can say, I am doing supply chain, logistics, transportation, inventory management, pricing, all of the things they are doing, they are doing this only, yeah? So, that is how powerful formulation is and now you can do, divide this into three groups and say, you manage supply, you manage demand and all that, yeah? All right. Now, I will tell you something very interesting that we are doing now. We did it at OLA, now we are doing it in the finders also. So, the idea is that, you know, what makes a overall system complex, if I had only one set of features, one prediction, one action and one metric, life will be very simple, right? Life will be very simple. But the problem is, I have many, many data sources in OLA. I can define states of many, many things. I can define a customer state, a driver state, a location state. So, this state thinking is another formulation that we need to think about. Then, I can take many, many different types of actions simultaneously. I can increase the price here, decrease the price there. I can ask that guy to move from here to there. I can give you more incentive to come on board at this time. I can do many complex combination of actions at the same time and, you know, in a complex state like this. And then, I have many, many metrics, metrics that I need to worry about simultaneously. So, what makes life complex is not when you have one of each or when you have interacting versions of all of this, right? And that is what I realize that, you know, fleet management is a very complex problem because you have to maintain all these metrics and anything you do on this state in that action is going to affect all these metrics in a weird way, very complex way. And that is when I realize that, okay, I need to break this down. So, we broke it down into two parts, which is, first I said, let us have a state team. This is going to just learn about the state of location, customer and all of that. So, this is the state of a system. This is the actions we took earlier. When everybody was tinkering, we were taking these actions, right? We have logs of that. And then, we also were measuring the metrics at that time. So, all the work that people do experimentally, tinkering, whatever way is generating lot of training data for you, all over the place, right? And really, if I take all of that training data and learn this model, which says given this state vector, which is a very large vector, given this action vector, what were the metrics? And this interaction is too complex for you to write down an equation for. So, you learn a deep network or whatever you want to learn to get that. And now, so here, the training data really is, you know, we keep thinking training data as features and class level, right? Inputs and output. In this world, training data is state descriptor, action descriptor and metric descriptor, okay? Is the output. So, if you think like that, you will learn a different kind of a causal model. Now, what do we do? We apply this to an optimization. So, we do the reverse. We say now, let us say my city is in this state, yeah? So, I know the state of the city and I know what metrics I want to maximize or minimize, yeah? So, you know, somebody will come and the head of operations at Uber will come and say, look, my driver cancellations are going up, we need to minimize that, but make sure that, you know, your revenue does not go down. So, he will give you a statement like that. So, nobody is telling you how to tweak the actions. They are only able to tell you what metrics to monitor and optimize, yeah? So, you are given a state, you are given the metrics to optimize and then you work backwards to say, therefore, what action should I take, yeah? And if you can understand this, this is kind of like embedding learning. So, think about it this way. Forward pass, what we learnt earlier is a forward neural network, yeah? Now, I fix the causality matrix. So, I fix the weights and now, what I do? I start with some action vector, I put it in, it goes up through the same causality vector, it produces some metrics, it looks at the error from what I really want, it back propagates that error all the way down, you cannot change the state, the only thing you can change is actions, you cannot modify the network and your action vectors are updated, then again you do a forward pass, backward pass till your metrics that you get reflect the metrics you want. You understand how we used abused a neural network to actually give me the actions that I need, so that the metrics that I want are satisfied, yeah? So, this is how we modeled the whole thing and inference is given a state and a metrics tell me the actions, yeah? So, now, you understand the training and inference are slightly different and we use this trained neural network to do that, all right? So, let me just formulate AI itself, yeah? Okay. So, we are formulating individual problems, then I say, you know what is AI, you know why do I have so many algorithms? Then I realized AI is a very simple stimulus response learning loop, right? So, you have stimulus, all of your logging data is stimulus, right? Fitbit data, your SMS for banks, your satellite data, all that point of sale is your stimulus data and one part of AI, one class of algorithms in AI only do stimulus to state and they say, you know what is the state? So, what is your financial health? What is your differential discipline? What is your capacity score? These are five scores I can generate from all the SMS you get from your banks, right? You missed the payment. You did not do the payment. Using all these SMS, I can say your discipline score is bad, right? So, how do I convert your stimulus score to a bunch of state vectors? You can call them features and then how do I take the state vectors into your actions and then this generator response and then how do I take the stimulus and response and say, did it work out or not, right? So, I did this allocation of this cap to that customer and, you know, the customer cancelled, that is the response. So, I take that as a feedback and then I learned, maybe my stimulus to state was wrong, maybe my state to action was wrong or maybe my manifestation of action was wrong and all of your AI algorithms will fit into these two parts, right? So, when you go to Coursera and you look at 10,000 courses, all you have to think about is either you are learning stimulus to state algorithms, vision, speech, text to intent, they are not fancy things, they are just stimulus to state and all of your reinforcement learning, rule based things, utility functions, MDPs, all of these are state to action. So, if you think about this and then start your AI journey, it is very simple. You are not lost because you know how the forest looks like. All right. So, here are my guiding principles. Know your algorithms, right? Not just the APIs, know your algorithms, but do not fall in love with them, okay? Because there is a next algorithm coming up and if you are in love with this guy, you will be very depressed when that guy is better, right? Remember random forest, everybody was a random forest and then XGBoost came and all these people like almost, you know, went into depression. Oh, I have to learn XGBoost now, right? My ranking is low, right? So, do not fall in love with your algorithms. They are all great. They are all useful in different ways, but you have to know when to use what. Second is, data science is not about data. You know, wherever people say, oh, I am a data scientist, give me the data, I will do everything that you want. You say, no. Stop it. There is no such thing as data science is about data. Data science is really about formulation, yeah? If you can formulate what is the idea of features, what modeling techniques will work here, what state action metrics I should think about. This exercise, you do not need to see data at all. You need to know domain knowledge and you can work backwards and say, look, this is the state I want. Now tell me, do I have data for it or not, yeah? So, state thinking is more important than data thinking. Learning causality is not enough. We keep building machine learning models, right? Predict this, predict that, score this, score that. Causality is not enough. You have to also apply optimization on top of it, without which you can only know the causality is stimulus to state, but state to action is an optimization problem, yeah? The next one is, our job is not to build models. When I say these things, my data scientists go crazy. It is like, really, I thought that is what we are supposed to do. I am going to do a Jupyter notebook and, you know, and then import this, import that, and now they will ask me, tell me what to do now, right? So, that is not what you do. Data scientists are not supposed to build models. Think about it. What you are supposed to do is, once you have formulated the problem, once you have understood the data, once you have engineered the features, once you have done all your parameters sweep, the output of all that exercise is not a model. It is a config file. It is a config file that says, these are the features I need, and these are the models I will use, and these are the hyperparameters I will use after doing all this. The output of a training exercise is not a model. It is a config file that tells you what your features and your models should look like. You deploy that config file into an actual system which will build your models every day, yeah? So, our job is not to build a model, but to discover the right nature of the model and then, you know, sort of engineer that whole thing, so that in production, the modeling, daily, weekly, whatever you want to do and inferencing at a very real time world is happening automatically. So, you know, when I ask my data scientists to deploy something, they just go crazy. Say, I do not know how to deploy anything, right? I can build a model. So, I take a very deployment first approach. I say, deploy a model, then train it. And they are like, really? Why do you want to deploy a model which is not trained? I said, you know, give me a random neural network that will work in production, and then I want you to improve it and replace it with that. Otherwise, you know, it is pure science, right? There is nothing to it. And the last is AI thinking is not about input output. It is really about, you know, the continuous learning loop. So, people think, okay, I build a model. It gives me some output for an input, but there is this causality loop which says, look, your model is failing on certain records. Your model is confused on certain records. So, what do you do? You send it to the labellers. They label it. Again, it comes back. Again, you get training, right? So, that continuous learning loop is what you need to think about, not a one-time modeling exercise. That's when we are really done. All right. So, those are my guiding principles. And, you know, like this dialogue, there is no spoon. Also, there are no hammers in machine learning. Thanks. So, we can just take two questions before the break. Anyone is having any questions? Yes. Yeah, I told them to do it as the first session or the last session. But they put me somewhere in the middle, right? Yeah, this gives you a lot of perspective. Now, you can enjoy all the other sessions, right? Otherwise, like, yeah, yeah, that's good, but why? Okay. I think I don't, we don't have any questions here. Yeah. So, in the starting, you talked about four data science skills, algorithms, domain knowledge, big data skills, and mathematical and statistics skills. And at the center of it lies formulation. So, how do you think the depth of knowledge in all these skills impacts the end product of the, the end formulated product? So, no, it's not depth. See, our growth and education should be like iterative deepening. You know these algorithms in computer science, right? Depth first, breadth first, and iterative deepening. So, we should learn any field in an iterative deepening way. That means you learn everything at some level of abstraction. Then you apply it. Then you go further down and learn everything all over again in the next level of abstraction. So, there's no such thing as becoming an expert in SVM. Five years later, I emerge as a data scientist, right? That's a depth first thing. Yeah, iterative deepening is how you should educate yourself. All right. Thank you, sir.