 If the delivery executive has reached before the food is ready, then the executive ends of waiting there and if the delivery executive reaches after the food is ready, then during this time the food was waiting. Both of these are bad options for us, but in any case the pick up is done once both are ready and then the last mile happens. Now just to be clear last mile is not just you know reaching the customer's doorstep, it also involves navigating through the society in which the customer is and or the office building and actually delivering and handing over the food packet to the end customer. So the first problem is capacity, you know in very short and simple terms capacity is about how many orders should I accept, right? So given I have 100 D's let us say you know what is the number of orders I should accept, in other words at what point should I start saying no I am not going to take any more orders from the customers, this is the high level problem. Now given 100 D's I can deliver you know 300 orders given enough time, right? So the first constraint is obviously you do not want to serve orders you know two hours down the line to customers. So the first constraint is there is an access and I do not want to serve orders beyond this particular time. Now obviously the second important thing we have already pointed out is how many delivery executors do I have at any given point in time? So that determines you know that is a base level determination. So if I have 100 D's what is the minimum number of orders can I serve? Obviously 100 you have to do better than 100 because you have 100 delivery executors. Now beyond 100 lot of factors start affecting you know how many orders we can actually serve in the max SLA time. So there are things like you know what is the traffic conditions on the road? If the traffic is less then my each of my delivery executor is actually delivering orders faster is coming back and then taking up the next order and delivering, right? So now you can deliver probably not 100 but maybe 120, 150, maybe 170 or maybe even more. So what determines this one factor is traffic whether and all our conditions that may happen may not happen you know customer density, how close our customers to each other and also customers to restaurant distances, right? So and distances is actually a proxy for time. You may in Bangalore for example 1 kilometer could take you half an hour to travel, right? Depending on traffic conditions and all. So all of these factors and conjunction determine what is the true capacity that I have? The same number of days. Now let us say you have 100 days still and all of these things remaining the same. Let us say you yesterday you delivered you know let us say 140 orders then with the same number 100 deliveries with everything else remaining the same. Imagine that traffic conditions are the same your odd number of orders that you have got from the customers are the same is can you still deliver 140 orders you know given that you have 100 days may not be necessary, right? Because 100 delivery executives at 7 p.m. in the night is very different situation from 100 delivery executives at 10, 15 p.m. in the night because these many of these hundreds are going to log off in the next maybe 15, 15 minutes whereas at 7 p.m. more are going to join the fleet, right? So your true capacity at is not just you know dependent on these factors number of days and all. It is also dependent on what is going to happen what is predicted to happen in the next n minutes, right? So now you see the nature of the problem is such that you know lot of external factors you know are coming in into play, right? Things like traffic weather conditions you know there is an accident on the road there is a road construction going on multiple things which are not predictable control what is our capacity, right? Another example given here is batching possibility, right? You can also take the same delivery executive and give him more than one orders as long as you know the customers are nearby and multiple other conditions are satisfied. So the capacity problem is more about how close to actual capacity can I get? We do not know what the real capacity is, it is very hard to determine that. So it is more of and what are the costs, right? So suppose I over predict my capacity which means I end up taking more orders than what I should have taken. What ends up happening is that customers will now get the orders delayed, right? So they will have a bad experience and if I under predict my capacity then I would have left orders and money and hence money on the table, right? So both of these conditions are not true and hence it is very important to operate at the true capacity level. Now capacity has two notions here, one is the more common form which is the aggregated capacity we calculated at zone level. In some sense it is a ratio of how many orders we have with respect to how many deliveries executors we have. Now the traditional notion of capacity you know and this is the stress in the system, the how many orders can we take is an upper bound, right? That bound is traditionally calculated by a lot of learning that happens in a very static and human learning mechanism that used to be the traditional way of computing capacity. And the way it would still work is because given enough number of days, enough experience of people who are running on-ground operations, it used to kind of be okay sometimes. But that is no longer going to scale, it is not responsive to you know sudden changes in conditions on the road, it is not responsive to weather conditions because you know rain happens once a while and when it happens you do not know what the intensity is and so on, right? So what is the right number where you would stop accepting more orders? So this is one of the challenges and often what it also leads to is that if I have an incentive as an operations person to get to serve maximum number of orders, I may you know tweak these numbers without any scientific reasoning and that could also lead to bad orders and you know customer dissatisfaction and so on. So we work through this problem you know for quite some time you know tried various algorithms came up with different statistical modeling as I said I am from the delivery engineering team so I work very closely with the data engineering data science teams and all. So the you know the current version that we are working with is called order limiter. It has it is very very much dependent on what is happening in real time on the ground, right? What state the delivery executives are and where are they in real time? How close are they to their destination? What is happening to the food that is being prepared on the restaurant? Was the last signal that we got in time as predicted or was it delayed by 10 minutes or delayed or was it early by x minutes? So all of these conditions are taken into account you know things like when will the delivery executive get free is included in the computations and we now have a much more dynamic version of capacity. So remember this is not stress, stress is still the old notion of saying hey how many orders I have with respect to how many delivery executives are there. This is more of what is my upper bound? So this kind of a system as you can see is very very you know dependent on what was my ETA? What was my SLA? What was my ETA? What is the current state of each and every order each and every delivery executive in real time on the ground? So a huge amount of data processing work is required here. What it helps us do is it helps us reduce those bad orders that we would have taken and it also helps us avoid any orders left on the table. Now this is a visual representation of you know how the system works. On the left side you would see for every you know n minutes of interval we would have computed how many orders we can take in that n minute of interval and the green graph basically shows how many we actually ended up taken. So as you can see around lunch time we are operating very close to what we could have taken and as the lunch time you know fades away around 2 o'clock or so you see the gap between the blue and the green graph increases that indicates that our capacity is unutilized at that time because your orders have got delivered but the delivery executives are still available. They are saying yeah we are still in the system. Equal and graph on the right side if you notice you know the points here and here these are places where we have hit the capacity and so at that point we would have said hey take no more orders and the yellow line and the blue line here are dynamically changing right. So that is saying not just what the current capacity is saying hey at this point I can take so many orders because let us say all my delivery executives are busy or at this point I can take so many because now they are getting freed up in the next 10 minutes. The other notion of capacity is point capacity right. Can I accept this order and this is also very important because you know for example if you have only 5 let us say delivery executives at a particular point in time right these delivery executives could be in the east side of the society because most of the times you know breakfast early morning breakfast orders come from a certain restaurants and on another side of the city probably there are none right. So while these 5 are there I cannot take 5 orders from the west side of the city because it will take too much time for these delivery executives to travel right. So this is an extreme example but it happens all the time where we say keep for this particular order you know let us compute are there any delivery executives available in the vicinity either now or are expected to get freed in the next 10 minutes. So both of these notion of capacity as you can see are fast computations they are approximate if you cannot afford to get all the data in the world so you kind of do all these computations based on some last known state and ultimately aimed at preventing both the sides of the spectrum bad orders on one side and the leaving money on the table on the other side. The other notion is efficiency in simplistic terms efficiency is how many orders you know per delivery executive per hour do I do right. So there are two notions of efficiency one is called the assigned efficiency and the other is called simply efficiency the regular notion of efficiency also includes what is the utilization of my fleet right. So if you leave that out in most of the conversation we will only focus on assigned efficiency part today. Now what are my efficiency levels how do I control that. So first is obviously who do I assign this order to right. So I want to assign a greedy approach could simply say hey I want to assign this to the nearest delivery executive possible but the problem is that for this order the nearest delivery executive could have been the best executive for another order right which is probably closer than this delivery executive is closer to this order right. So now we are talking about not doing a greedy approach but doing a more of a global optimization. So there are you know versions of Hungarian algorithms that we have coded up I am not going to get into that because the focus here is not the assignment algorithm but the focus is what is the basis that we use to do these kind of optimization right all of these the basis is I know where the delivery executive is I can estimate the time that this person will take to travel I also have no answers to say that you know there are possibilities that this guy may even reject the order he may say that I do not want to take do this order we allow you know for to certain extent we do allow this behavior because delivery executive may have certain preference they may not want to go in certain area or they may not want to maybe it is very late they were just planning to log off from the system and an order came. So rather than accepting that order they would reject the order they can be various situations. So now we are talking about building in those possibilities into our prediction algorithms building in those possibilities into our predictions of you know how much time would it take to travel to the restaurant and so on the other liver is when to assign right so for example if I have to serve an order and I see a delivery executive that is one kilometer away I have a choice of assigning this order to that executive right now or I have a choice to basically wait for some time and see hey is there going to be another delivery executive who may become available which is closer than one kilometer to this order right so time dimensions dimension is another liver I have then you know can I batch orders we talked briefly about batching what enables batching is that you know the customers two customers or three customers who I am going to deliver the order to have to be close enough not just an and the orders have to be close enough in terms of both the source which is the restaurant and the destinations which is the customers and also in the time dimension you do not want to you know batch two orders if they have come 20 minutes apart you want to batch two orders if they were close in time dimension as well now obviously the reason why this is important is can I predict whether a specific order is capable of getting batched with another order given all these constants right so again you are talking about you know looking at lot of data learning from the past and taking real-time input so for example if there was an order in the system already you know five minutes back that came in which satisfies all these three criteria for the order that is coming now I need to be able to make that decision yes there is an order which potentially you know with a lot of certainty can be batched right so now when I it is important not just to predict this batching because you want to know what the capacity and efficiency of the system is going to be like but also because you want to promise an SLA to the customer right now imagine that there is no such order which is already there which is waiting to be paired then you basically ask the question hey what is the chance that in the next 10 minutes in the next 15 minutes you will get another order which will satisfy these three conditions right so in both so these two conditions are called second order batching and the first order batching and in both the cases we again do a lot of you know data analysis both you know real-time as well as building data science models to learn these behaviors the other diverse we talked about those different legs of the journey order to assign time first mile time last mile time preparation time customer to customer distance batching probability each of these are data science models you know for predictions and each of these have you know specific you know error into them I'll talk a little bit about errors so as we go along the important thing is that all of these models work in unison for us to be able to promise an SLA to the customer and as you know errors in these models add up the second important thing also is that some of these models are used for internal efficiency and internal predictions right and some of these models are used to make a promise to the customer so we also end up doing like different models sometimes to optimize our systems and to make a promise to the customers the the other way we look at data in swiggy also especially in the delivery team is also with respect to trade-offs for example if I want to make my efficiency very high I would want to batch orders together now when I batch two orders I give it to a single guy they might be another delivery executive who is waiting for an order but gets none right so this is a trade-off between what is the experience that I give to my most important you know resource which is in this case the delivery executive you may end up you know giving them suboptimal experience and they may end up leaving our platform right so if I make my operations very efficient I may compromise on the experience that I give I mean they are here to earn money and we are impacting their earning capacity similarly it could also be that if I batch two orders together one of the orders that came in earlier could possibly have been delivered first right so that order is made to wait because there's a second order that we are possibly going to batch it with so there's a trade-off there the first customer probably is getting the order in the promise time but lesser or it takes more time than what it could have possibly taken so there's a trade-off right there similarly you know when I want to make an assignment so I want to find the most optimal combination of orders with the delivery executive now while I found find the most optimal combination I may end up starving some orders right so some of the orders may not never fit in into the let's say I have only 50 delivery executives and every time I am especially in peak times this happens every time I have like enough number of orders and less number of d's compared to the number of orders and if I keep super optimizing myself you know those orders that are let us say away from where the delivery executive density is may end up getting starved so I can't let that happen because for those few customers whose orders are starving the experience will be very bad so these trade-offs are very important in every decision we make there's an efficiency versus speed trade-off right we talked about it when when I batched two orders for example one of the orders at least you know sees larger delivery time similarly there is a speed versus compliance trade-off right so especially in prediction models when we build our models I'll talk a little bit about why there is natural variance in the data and no matter what model we build there is going to be you know a noticeable amount of error and we have a choice to train our model in either a way that for most customers we deliver within the promise time or we have a choice to optimize our models in a way where you know we are as close to reality as possible but probably in the extreme so it will still you know break right so those choices again are very important so when we are doing so now that we know capacity efficiency has each of these problems while have their own logistics challenges they have they have heavy dependency on the way we use data whether it's for predictions whether it is for real-time processing or whether it is even for debugging and you know understanding what is happening in the system so what are the challenges one is obviously the location capture accuracy it's a very common problem in logistics where because of you know accuracies of the devices signal presence versus absence and you know battery drain considerations and all location accuracies itself are not very optimal so one of the problems we work consistently on that is that then customer locations and restaurant locations themselves while we do everything we can to make sure that they are accurate there are instances where because both customer either has not entered the right location or in case of restaurants also for some reason or the other will see that their lat longs are not precisely at the area where they are actually located and even a slight deviation in these locations could have implications on how we are you know predicting our models then there are so all of these logistics space right it's not just in Swiggy but one thing that is very important in the logistics space is the accuracy of data right everything else being right just the nature of you know capturing a lot of this data from physical human beings is that they are all driven by their own motivations right so unless we incentivize each of these data input points appropriately we are bound to get signals that are explicit but not accurate right and one of the things that we focus on very highly on is how accurate is our data because if you don't fix that right now then our models also learn the wrong you know learn from wrong inputs and that's a error that you don't want to be there in the model so an example of DE behavior is that yes obviously he may want to save the battery and so doesn't update the location but could also be that you know the DE expects his food to be prioritized by the restaurant if he sends the signal that he has arrived at the restaurant although physically he may not have arrived right so that in his understanding you know makes his food get prioritized in the restaurant stack and so his order will get prepared faster and so he will be very efficient on the other hand you know this is how the human dynamics works right restaurants are smart people themselves so they know some DEs are doing this or maybe many DEs are doing this they also have this thing that they have offline presence right there are customers sitting physically on their tables they are they are also getting orders from outside of swiggy right whether it is offline or whether it is our online competitors we don't have access to that data restaurants are optimizing for those orders as well right so restaurants have an incentive in some cases especially during peak time to wait until the delivery boy shows up now you see it's a chicken and egg problem right the delivery executive is saying I have already reached when he has actually not reached and the restaurant is basically waiting for him to come and then say okay only then I'll start preparing food because then it is guaranteed all of these are you know things that cause incorrect data input into the system this is an example so this is plotted on kepler.gl and what we do is we instrument each of the data points you know where a certain event happened what time it happened and so on so this is an output of that you see the white dot here is a restaurant obviously this is in some percentage cases of our restaurant not all but where it happens it happens really badly the these are the pickup events where have the delivery boys marked the food has picked up right so those are the latlong around there and the white dot is where our you know offline method of gathering restaurant location when you onboard the restaurant you basically get their name address and blah blah and we also get the latlong there so this kind of an analysis clearly shows that there are restaurants in our systems which are off from the actual location so we use the we do this analysis at scale and then figure out what are the restaurants that are you know greater than x meters away from their from the centroid of the picked up locations and that is how we go and fix proactively the restaurant locations. This is another example which is again done on Kepler which shows the deep behavior right we were always aware that you know some of our D's they mark their arrived locations away from the restaurant but the white dots here are indicating so the denser regions there are the actual restaurant locations and the cloud of things that you see around those denser locations are where many of the D's have ended up marking the reached location right so while we were always aware through you know plain statistics you know saying that hey x percentage of our D mark locations for arrived you know are potentially a geo breached condition but when you visualize it in this format it looks very clear it is very clear that the problem when the problem happens it can be pretty bad you know because some of these locations could be as far as away as one or even two kilometers away from the actual restaurant locations. So what do we do about it we have a lot of product and tech interventions we have done things like automated detection of location geofencing input verification so if the D is saying I have reached have you actually reached can you look at the restaurant can you look at the restaurant lat long and the current location and actually judge whether you have reached now you have to do this even in the absence of a very strong GPS signal sometimes right which is when you make the decision based on again data analysis what is the probability that he is telling the truth you cannot block a D because you know that location inaccuracies are there I could be standing here and my location could be showing off on the other side of the road it is very possible very much possible you know few tens of meters of granularity sometimes you 100 meters is there you cannot tell a D that hey you know you are lying and I will not allow you to mark reach especially if he has actually reached. So you have to accommodate for those errors and there is a lot of data processing and error handling that happens to probabilistically judge whether the D has actually reached the location or not and then obviously we have some operational interventions. These two graphs are you know our last mile and last mile times and O2D order to deliver time you can see the variance is what I am trying to highlight here the data is pretty much you know high highly varied it spread across the x axis the numbers on the bottom are hidden this is not necessarily you know a sample from all zones is from a sample zone where the variance is very high but the variance itself is is something that is inherently there in the system right. So the idea is that as I told earlier there are so many external factors that are affecting variance and not each of these external factors are possibly captured by any data science model because you do not have data for it right you can develop the best models even the best model will have some errors but here there is so much natural variance that even your best model will have errors. So what do you do about it? Obviously this limits our prediction accuracy and the way out of this is not necessarily while we do chase accuracies but you cannot not do anything about it you know and one of the things that we have tried doing is is to proactively account for the inaccuracies right. So any decision you take for example you want to decide when to assign or which D to assign if your accuracy in prediction is let us say two minutes right and if two Ds are one and a half minutes apart then making a choice between them is not going to make a major difference right there is one way in which we can use although in practice we actually assume that when such a decision is to be made we actually assume the best possible you know choice there. Then identify the sources of high variance so it could be load on restaurants it could be the D behavior it could be the stress on the system currently and there could be many hyperlocal scenarios like a traffic jam in a certain place and so on. So the goal here is that some data like how many restaurant how many tables of that restaurant are occupied as of now is not something we will have or we will probably ever have but you know other things like how many of Swiggy orders are there you know what is the menu of each of those items right is it ice creams is it biryanis is it rotis lot of rotis each of these food items have different characteristics of preparation right. So multiple of those you know inputs are used to increase the prediction accuracies. So next I come to now that we have so many you know algorithms and so many data focused decisions that we take how do we actually verify whether what we are doing let us say you want to do a change in the system how do we verify whether it is actually functioning or not. So we start with the hypothesis we you know go on saying that hey if we do algorithm 1 versus algorithm 2 here is what are the metrics that are expected to increase decrease move favorably move on favorably and so on. And these are the metrics that are my guard rails right I do not want to go bad on NPS I do not want to go bad on let us say my order to deliver time for some cases and so on. So you fix these hypothesis then you run the solution you do the debugging and implementation and everything and very importantly do a lot of instrumentation right. So every single aspect that is possibly could be affected by the changes that you are making needs to be captured then we do a very important thing called simulation right. So simulation is what is happening in the real world can I make this happen in software can I simulate delivery executive movements can I simulate the order rates can I also simulate which areas the orders are coming from can I simulate the behavior that delivery executives do like for example at certain rate I will reject you know x percent of my orders at certain point in point in time. I am going to log off at a certain point in a day at so and so rate right. So these are different variables that your simulation environment can do. So run this change through the simulation environment and one of the most important things here we have learnt is it is a kind of a challenge to keep your main code and the simulation code in sync. So as much as possible we try and you know do this together then we do a shadow mode execution right because each of these logistics experiments are something that you have human beings who are running right they are taking these decisions. So it is very hard to you know just run simulations and say what is the result going to be like. Mostly simulation gives directional trends if I do x this metric will improve in the range of z to p percent right it may not tell us that this will be the precise improvement in the metric when you actually run it on the ground. So you start running it in shadow mode where shadow mode may you are actually getting real life data but people who are using the systems they do not see the effect. So you are not influencing anything on the ground but you are gathering enough data to do analysis both real time as well as offline and then finally you run on ground experiments right. So simulation and on ground experiments are something that I am going to talk a little bit about more and then finally when the on ground experiments are successful then we roll out. Now before we get into the next slide I have a small video I will just show a snapshot of it. So this is how our simulation runs it is a visual representation all the green markers there are the delivery executives who are free all the red ones are those where the orders are assigned and you see those red line shortening is where the delivery executors are moving towards their destinations and the blue ones are where the delivery executors are moving towards the restaurant. So this is how we run simulation and when the simulation on the top left corner you will see this is a certain zone there are so many D's who are free at any given point in time there are so many the ratio there is representing what is the stress on the system and the other counts are how many free D's and busy D's and all we have. So this kind of the system actually gives us directional results on if I take a certain algorithmic decision then what is my likely output going to be like. Now once so this is so this graph actually represents what is our you know results on simulation versus actuals right. So you can see the blue and the red line the red line is the actuals and the blue line is the simulation results you can see they are not completely on top of each other they are not like duplicating each third but the trends are very much the same. Similarly here is the histogram the red one is the actual and the blue is the simulation again you see the same observation that to a very good extent we have been able to simulate what is happening on the ground right. Now the last part is you know how do you run on ground experiments and on ground experiments is a huge challenge in logistics because as I said too many external factors too many factors that you do not control. So how do you choose a test set versus a control set is a significant problem. Now what we have learnt is that first strategy was like simple pre-post we run it we run without you know doing any changes for x number of days then we run it with the changes and then we observe the differences. This strategy typically works well if you have significant impact on parameters and if you have eliminated any large variable changes like if IPL match happens there is the number of orders like increases drastically or if a festival is there the number of supply the delivery exhibit is available decreases drastically. So you eliminate some of those large variables then this kind of gives okay results but it has to be observable enough so it has to make a large impact. The other one that we tried was alternate days this worked a little better than pre-post because alternate days the variations is much lesser than it is over a two week period. Then the third one is control versus test zone can I find a zone which is very similar to the zone in which I am running experiments. So one zone is control the other is test we run it simultaneously for multiple days. In each of these experiments there is some bias there is some variance it is very hard to eliminate each of those but mostly these are suitable for large variations. Then we have gone into more sophistication we have tried time slicing we do a certain strategy for five minutes you know and the next five minutes we change it and so on or we also do it at higher granularity one hour granularity and randomized selection of orders like on certain order we apply one treatment on certain orders we apply another treatment excuse and then we measure the bias and variances in each of these strategy. The last strategy which also works in a lot is randomized geospatial selections right. Can I run a strategy in certain geolocation whereas in a nearby geolocation within the same zone you run a different strategy. Now one of the most important thing here is network effects where if I take a certain decision for one order is it going to impact another order or not. An example is that if I change my assignment algorithm and if I you know assign one order to a specific DE it has actually affected another order for another DE right that order now cannot be assigned to another DE which means the behavior for another DE has changed because that order is never going to be assigned to the other DE. So, some of these you know test and control mechanisms because of these kind of network effects you will never be able to apply right. So, in logistics it is very important to know how do you create the isolations right. So, for example, if I promise one SLA to one set of customers another SLA to another set of customers their behavior can be independent. So, in this you can do the order wise split right, but if you are doing you know if I batched let us say two orders right. Now if these two orders are batched they are influencing each other which means that the second order cannot be batched with the third one in the other set right. So, there are these kind of things that we have to be very careful about. Quick on visualization you know when we plot a lot of these points on a library like Kepler.gl we get a lot of insights that aggregate analysis does not give us right. For example, in this case you will see that the blue points are where this restaurant at the center the white dot is serviceable whereas the orange and the red points on the externalities they are not serviceable. You will see some blue dots even on the edge. So, that led us to ask this question why are they there you know why are those red and blue dots very close to each other. In this case we found that you know it is because of you know U-turns and all in some places you require a U-turn on a latlong on the other side of the road you may not require U-turn that makes a difference. There is another example where visual representation gave us a lot of insights on the extreme right you will see these are these are restaurant chain our orders our customers are not doing a great job of choosing which restaurant let us say this is a chai point is not but let us say this is. So, if I show three options to the customers for chai point which are all nearby and serviceable the customer does not necessarily make the best choice. He does not know that the chai point you know this one is you know probably only 500 meters away and the other one is one and a half kilometers away the overlaps they are showing that right. So, it is very important in logistics to do these kind of visualizations without which you will only be restricted to aggregates. Quick summary of learning know the input data very well the source of variance and account and accommodate for it minimize the input accuracy fix the data source at source understand the trade-offs simulations use simulations and the experimentation very rigorously right without this statistical analysis of each of these experiments it is not possible to do any kind of progress in logistics. Questions. Thanks Piyush we can take a couple of questions anybody has questions okay maybe the post lunch you practice Kiki all right. So, Piyush is around and if you have questions you can ask him and there will be other discussions happening. So, we will head into a birds of feather session can we stop the live stream now okay. So, I would like to invite the facility and if at all I would like to welcome on stage Sudipta Chaudhary from can we please settle down so that Sudipta can start speaking can we please settle down okay I would like to welcome Sudipta Chaudhary to the fifth elephant thank you for accepting the invitation to present I suppose yours is one of the sessions that people have been looking forward to. So, let us hope it is a interesting discussion. Shall I thanks thanks. Hi good afternoon good evening whatever it is so I am at that juncture where we are starting the session but I think there are some more sessions after this yeah all right cool I have the way I have thought of was that I would put up something which talks about the journey and all those but understanding the crowd and understanding the people and the expectation that you have I felt that let me make it more interactive with you guys rather putting up a boring PPT over there because you guys are actually writing some thousand line codes or two thousand line or ten thousand line codes it doesn't make any sense for you to actually see this presentations and all those but anyway if there is anything that needs me to hook up on I will go to back to the slide but otherwise I will rather refrain from referring the slides and all those and make it a more interactive session it is a small crowd so it makes sense if you can pose your question in between just by putting your hands up I don't wait for the last 10 minutes for the question so let it make it let's make it more interactive let's make it wherever you have any questions to understand what we are trying to talk about now just to give you an understanding I come from an organization whose core business is not analytics I come from a business whose core interest is not analytics but then why analytics has been brought in what was the reason for an organization of nature of telecom to get into analytics and you guys wherein you are developing all those interesting products you are doing some high-end cutting-edge technology developments why is such big things that are happening in the world are not touching big organizations is what needs to be first understood now any organization will be interested when it impacts its bottom line or the top line until unless that touches then there is no need telecom as a business was growing fantastically well 90s it had a fantastic run 2000 the first four five years that very good run it was all acquisition led so there was a huge amount of competition and there was a huge amount of market to capture so there was no need for anything the gut calls of the of the managers were good enough for giving that extra and the growth so there was no need for analytics because the growth that used to be seen if I had not taken any right decision versus if I had taken a very very right decision would have been in the positive zone from 12 percent to 14 percent and all those quarterly how does it make a difference how do I know what is the potential so automatically people were not interested when the market started stabilizing when the market became hyper competitive when the market showed that there is no more subscribers to go for Tamil Nadu as a circle was having more than two times its population target population base as well penetrated with the mobile numbers that means those are the days when there was multiple sims dual sims triple sims there were Chinese handsets with six sim slots now in that market obviously the market was changing then question came how do I make a differentiator how do I see that as a so the organizations and ideas as well as at the time undertook many transformation projects and analytics was one such transformation project just to give you an understanding we talk of data we talk of today petabytes and of our data and all those but in that time I'm talking about 2008-2009 when I joined idea and took up this charge of analytics at that point of time even the definitions that were used for defining a customer was not in place who's a customer how do I define a customer is not I'm saying that not that from the perspective of the mobile number is there perfectly fine is it a revenue earning subscriber is it what type of subscriber is it is a zero usage subscriber those definitions are needed to be put in place so the warehousing data all they're put in place and then started the journey of analytics so the I can say the triggers always of such absorption of analytics comes when there is a business need and those that business in those days where competition hyper competition I would rather said 10 to 9 to 10 operators in each circle so where is the pie to chew from so that's where the market started changing now where was the focus then the focus came to be churn management there is a huge number of subscribers who are churning out they are moving from my operator to my southern operator then put something so that I can hold them back now here in when I say this I also need to highlight one thing I need to tell you guys that whenever as an analytics journey you embark on this journey and I'm sure you are you are now in in developing something you are productionizing something you are leading some organizations perfectly fine it is important for you to estimate the impact of your work in the overall scheme of things and it is very important for you to understand and estimate it and tell the organization that this is the impact that you're expected to have if you go ahead and do this how do you prioritize your impact areas so that's one thing that is very very critical and crucial so just to give you a feel of it anyway by the time it comes up just to give you a feel there are hundreds of different use cases and there are hundreds of different operators and partners who come up to us on a regular basis stating that we have done this in so and so country and we have done this in so and so operator then why can't we do it in your organization how do we choose the right thing how do you make the right pitch how do you understand the business problem before you make the right pitch how do you take up these and prioritize those are the areas wherein it is to be understood now anyway so we embark on this journey we identified projects we identified areas and the question is when you start this problems are identified just to give you this gives you some understanding in the telecom world what are the different areas wherein analytics can be impacting so obviously whenever i'm going and talking to my senior management if i go and speak something might not be internally valued but when i go and talk that the gardener is saying so and it gets valued it might be so that i'm coming from gardener and doing this pitching here but anyway so it impacts so the brand value is very very important when you work in a big organization who is speaking who is coming and pitching on that so that is very important so you have to get some right names behind you to put the point across that you want to drive right so here anyway identified the broad areas wherein we could make an impact upon now question came is that fine this is important so start doing it so what's the big deal start doing it so they have a problem you have data already coming in so start giving us models what i found in this journey i'm not getting into this journey at all is that it is very important for us to understand how we set up this whole ecosystem and that's where you need to be understanding what are the questions what are the strategic decisions that need to take so that you can build it in a manner that is sustainable and when i use the word sustainable as i had personally was involved from 2009 onwards until date setting up the whole thing let me be very frank with you guys i have seen that people come and join to enrich their series so it is important for the organization to understand and take a holistic understanding and take a long term view rather than encouraging something that is not sustainable so it is very very important that you need to understand whenever you're pitching something whether you're doing it for here and now or whether you're doing something to impact in the long run of the organization so that makes a lot of difference so if you have to make this whole thing sustainable then you need to take decisions which are futuristic so with this with this statement let me just go back one slide if you see this the evaluation phase of anything that we are doing was pretty long and you might be in today's generation you might not be interested in having situations which is such long evaluation period because you will be saying if i spend one year on evaluation i could have delivered something really fast but please understand it is very difficult in big organizations to change your path midway through if you have taken a decision you have to make it a successful one those day we talk about learn fail again recoup and again run you can't change direction so very easily in big organizations in big organizations you have to prove your metal you have to deliver it and then only you can go and otherwise you lose your face which leave the job and move on so you have to be very sure of what you're proposing the evaluations that we went about doing where with big names in those days i mean to say in today we are talking of our python and all those in those days sas ibn ibn had brought in spss and all those we had a thorough evaluation and finally went with sas as a platform in case of the the iris what i talk about is the cvm that is a campaign management ecosystem analytical ecosystem we had brought in mckinsey to be the partner to be laying out the whole thing how the whole campaign management ecosystem will look like when we went about the bi it was ibn as our technical partner brought in db2 as the warehouse so i mean to say it is all long run effective decisions that needs to be taken you can't take short term vision and you can't make those calls which will be not paying you in the in the in the days to come so as i was coming back and talking about the question that we asked ourselves just to see for yourself if i was looking for something to be delivered far pitching the next day to my competitor and getting that particular account so again a big organization will be very cautious about taking that steady step wherein they are sure that what i'm doing is going to stay to some extent with me and it is not replicable it is not copyable so that's something that is a very quick agile organization versus the big organizations slightly differ so that is the first question that we had answered so to keep the ip within we had got it whole developed in house another another thing that analytics setting up request most interestingly the cases comes from the marketing okay so because we let's do something with the marketing function and in in a telecom company sales another area service delivery another area finance the revenue assurance department another area network another area so why can't i have multiple analytical instances in each of these functions we can have that and it is very easy for everybody to have their own fiefdom sort of thing i'm calling it fiefdom so that my analytics resource does analytics for me and what analytics and in the name of analytics is m is publication so to avoid that it is important that the organization takes a call and values the true sense analytics and sets up an analytics wing or a division or a department which is going to serve across a different function so that it is able to gather the knowledge it is able to understand the problems it is able to crisscross the learnings and make it a more enterprise wide and analytical venture so that's where the question came whether we had spread instances in each and every circle for us we have 22 circles across each and every operations and geography and we have these different functions so that's the call that we need to be taking again when it comes to the campaign management a very easy way to out is that out of the box solutions you pick up something you just fit it plug it and then start rolling out and many of the organizations today also comes and says that we can be giving a solution to you in no time so you just plug in and you can start working the problem if you go with that they have brought a product which has worked in europe which has worked in us did not necessarily mean that it will be working in indian prepaid telecom scenario so it is applicable not only to telecom it is applicable to any product that you develop in your own a domain if the product that you are trying to say it is out of the box is not relevant for the market or the product or the company or the culture of the organization for which you are working it is not going to be accepted very easily you will be getting into a situation where in every now and then you will be asked to change and to how to avoid that change it is important that you understand the geographical tenants you understand the cultural tenants of the organization before you propose a product for it to be taken forward otherwise it will be taken i'm just giving a simple example if your organization is an indian organization where in the numbers that you are putting out is reported in million he will be coming in the next generation in crore i'm saying it can be going to that micro level but if you show him the product in a very first day crow you might be finding he is relating so those are the small things that needs to be taken care whenever you are trading something out of the box and customizing it in our case we got it customized in our case the whole solution was customized to the level that we will be looking for a platform you give us the capacity and the capability we will say how we become the product managers and the partners became the actually the engineers who delivered it and then delivered really good a very important aspect of the starting in the analytics journey is that measuring of the returns it's very very important because how do i justify the next year when i ask for an extra budget that i want this much and the next way how what did you do what was that effect so it is important for us to justify the effect there are two ways we do it one is obviously through the monetary terms if you can justify it through monetary terms well and good but if you are doing something which is not so tangible because many times if you are making an analytical proposition the campaigns that are being run on the basis of that or some strategic decision that have been taken on that and the impact that has been seen might not be happening the next month it might be happening over a period of time it might be difficult for you to quantify so how do you make your work valued find ambassadors within the organization who can be talking about your work to the right people in the right point of time so that you get the right budgetary allocation the next time so that was one route that we had taken because it was very difficult for us to prove the value that we are giving in tangible terms on day to day basis because whenever i say a campaign is run based on analytical models models say this is the guy who is going to churn and this guy did not churn after giving a product who claims that the revenue out of them is it me or the analytics guy or is it the product guy who has developed the product i mean to say the construct of the product in this particular case not the product that you developed but the 26 rupees recharge give 35 rupees stock time is he going to take the benefit of this or is it the guy who has communicated because he wrote a script and all those so there is always a clash between these different teams who will be deciding saying that this revenue is mine so to avoid that and you decide who is the owner of this whole program and if possible try to see what is your share of that particular thing and if you are not able to do that please find good advocates right advocates for the work to ensure that it is being taken through so that's very very critical for the journey to begin and the journey to give a concrete size and shape and obviously driving adoption means you have to make a cultural change in the organization people will be saying that i know this better than you guys you guys are you know something in the in the statistics or modeling and all those things you did not know the dhanda that's the normal terminology that we get to hear so you don't come and try to preach me but how do you drive drive the adoption so you generally say okay fine i completely agree to you let's do a testing let's on one base you run the campaign on small base give it to us or half the base you give us a base let us run the product let's see how it performs and when the results come out it becomes much more easy for you to convincingly convince the person and onboard him on the journey that we are taught about talked about that's the route that we took to ensure that analytics gets driven and analytics got driven in our organization not from the top we always keep on hearing that analytics needs to be driven from the top and i personally have found it to be better working as it went from the bottom so there was a need for analytics coming from the different corners of the organization there were people who are asking for those things and obviously it was supported by the top that is the reason we are here and we have enlarged our team from two member two to a 40 member team but the the actual ask came from the bottom so that made it really grow and they became the guys who are using it so those guys who are actually doing product they started doing analysis they were not doing analytics they were simply doing analysis but in the name of analytics they were also putting the point that they're also doing analytics but whatever it is so they're also doing it so that's the way it went about changing the way now question is what do i do in the name of analytics what is that we do so what is your feel in a telecom scenario what are the things that we can be doing obviously someone must have worked in in a consumer facing industry what are the areas where in you feel we have been doing our work and what are the nature of analysis that we do how we can excite the people the business managers with something which they did not think of obviously churn cross-sell upsell will be the areas wherein they will be asking for and you have to deliver but how the excitement is created because analytics is something that is something to excite if you can't excite it you just can't give models and say i give you revenue i give they are not going to be listening too much so to excite them with something really funky along with your regular mundane work because otherwise they are not going to sit through your presentation for 45 minutes or one hour listening to all those things that he feels that he knows so it is important for you to excite the things that he doesn't know which you know better so it is important in that manner so can you give me some examples wherein he doesn't know and we as an analytics fraternity can be guiding him in a better manner like very very nice example just just taking the word and then using the word campaigns when are being made mostly the campaigns today is being not made on sms is not a very very very effective medium but five years back three years back sms was the most effective way of communicating fine so if i am sending an sms in a language which the person can't read or can't relate obviously doesn't make any sense so it is important for me to communicate with him in the right language and when i say the right language i need to know what where is it coming from what is his language preference hardly a bare minimum around five to ten percent of the guys actually puts their language preference in the idea but what about the rest so if he is in maharashtras and everybody marathi absolutely no nasi has got a huge population who's migrant coming from the up eastern bihar how do i find it i find their traveling pattern i find their calling pattern i find their usage pattern i find their lokma lokma than other news news channel subscription pattern and thereby i am able to understand this guy might not be a true marathi speaking guy but some other language guy and that helps me to identify it and then push a sms if his handset supports it and then makes a difference so you can understand in this way how the analytics makes a difference in each and every aspect of that communication another angle let me talk about communication has got different time points has got different response rate some of the guys respond more if you send the sms in the evening some of them respond more if it has been sent in the morning so again if you're able to prioritize those time slots it makes more effective way of communication so if these things are something that you find out inside generated talk to the business then you go and propose your thinking then it makes much more connect then the people start okay fine he's talking sense he's understanding what my problem is he's helping me in increasing my take rate so thereby the connect comes up and obviously you have to do this with not with those the codes or the jargons and all those you have to go down and say if i had sent this sms on so and so time this is the take rate if i go with this model this is the amount of take rate increase that they connect and they're able to relate so that's somewhere somewhere you need to understand and appreciate and give them that particular comfort zone for you to take forward your world that you have done so that was the way we went about creating excitement around small small pieces such that it can be taken forward just to give you a feel of it i'm coming to this so this is the nature of nature of information that a telecom guy will be having usage based any day they will be having all this information how much time you spoke whom do you speak to what where from you spoke to which handset you are using all these are transactional data right so so how do you make it make it more interesting is that when you go beyond this and try to say i'm not only giving you this i'm also trying to give you age age is not available to us can you believe in today's day and today's world we don't have the subscribers age very confidently why because when we go and fill up the form many times we do not give the right right documents the retailer gives his own so where do i get the age from how do you get the gender form so that's where you try doing understanding the subscriber in a better manner understanding the subscriber from the different footprint that he leaves that means whether he is ordering on amazon whether he is going and taking the product from amazon or whether he is browsing netflix for more hours how do i understand all this and then try to understand his age his gender and all the aspects and create a persona of that person so that's the nature of activity that goes behind to excite the people and it is not only this footprint that he leaves but also the digital profile that we can create a subscriber to get a better understanding of the person and then we can pitch the subscriber in a better manner in an effective manner i'm just taking some marketing use case for you to understand what is the nature of analysis i'm not getting the other aspects of network sales customer service and all those i'm just giving you a few to understand it now that's the way the whole thing got developed but what is important for you guys to appreciate and understand and what i believe let me be with more giving some sermons over here it might be sounding that way i feel the five pillars of any organization's success is this five to begin with you need to have this otherwise this doesn't go through so you have to have the right technology as i explained in the first part of my session with you guys is that when i say that technology needs to be forward-looking it can't be here and now in big organizations you can't be doing something for today's thing it has to be for the future people that's another very very critical thing i had seen all i had seen enthusiastically people joining us i have seen enthusiastically people leaving us and i had seen again people joining us and off late i'm seeing again people leaving us so i have seen the full cycle two times so why people leave and how can you make people stay back you are in a age wherein people switch for any damn reason believe me people are leaving because of any damn reason i've gone other days when people walked in one organization for generations it's over so people will be leaving and especially this community leaves at a much faster pace because the market is hot it's not because they have a they have always chip in their shoulder because the market is hot there is always a demand for these resources so they are able to leave this same guy would have stuck to the job for a longer duration if there was no demand let me be very candid on this so they don't wait for their hike because if i go change i wear 20 percent 30 percent hike whenever i change so they don't wait for the annual increment so they change so how do you hold them back how do you ensure that they are with you the important thing that i have understood and i have tried to inculcate was the modelers who are being hired many of them freshers from premier institutes also not so premium institutes were given tasks where they were not differentiated so they were guys who were made to do the end to end of the modeling job what i mean that by end to end job is that organizations take a call wherein they make some guys the junior guys do data cleansing job then slightly senior guys doing the modeling job and slightly more senior guys will be doing the more the talk job that means they will be going and getting the projects and all those what i felt with some initial interactions was that they are actually looking for a holistic experience they don't want to do the cleaning job for the first two years of the career and then move on to the modeling and then move on to the the development piece they want to do it from day one and believe me those guys are no way different from the guy with the four years of experience don't give a very complex problem to them them you can give a competitive simple problem to them but make them do the end to end job that means he pulls the data himself he ensures that he develops a model and make him the owner of the model and that way i have seen that they all stick to the job and that way i have seen or ensured that leave the job because they are ready for the market so it is a cash 22 you decide how much you want to make those guys stay but in a somewhere in the long run of your career i look back and feel it was a good decision because in some way those guys career those guys ability to develop as a resource and thereby the fraternity in which we work has developed so that's that way i see good but obviously when i see and one finder he just comes and says i'm leaving so very very difficult situation people is a very very critical thing in this whole journey and i'm sure everybody is facing or everybody is doing either of the two so please be very careful as that that 10 percent 15 percent hike and moving because of that is not the right decision that i can be very sure of anybody to be doing see for the profile of the growth opportunity the learning opportunity before you make any decision of that sort so that is very very important for any decision even if the team is doing that please ensure that they don't do that and data obviously was not a big problem for us because we had all possible format of data that we wanted so it was not a thing now the last two is very very critical for our journey and curing i can say that it was very very important reason being if our work is not deployed if our work is not streamlined and flowing without any hindrance to the the campaign management ecosystem or the sales app then the whole work that we have done is meaningless so as i have made the person the junior most guy responsible for the work that he does in terms of data collection data preparation data modeling data uh out of time testing out of sample testing and going and selling to the market he has to be responsible for ensuring that the model is deployed he has to be made responsible for ensuring that the value that he gets out of the model be seen and then come and talk about so he's annual appraisal that he is being done you don't come and say that i have cracked such and such code very good you have cracked well and good i have cracked that business problem fine fantastic but i will appreciate only when the thing that you have cracked is being deployed only when the deployed status is a regular run status only when the business has adopted it if business has adopted it that means you have created something for your own fun and your own liking i don't value it so it is important for giving that message to the people in your team that you don't create to enrich your CV you create something for being valued by the organization and that's what we are all working for so once that message goes loud and clear i found comparatively the nature of models that got developed the nature of problems that got developed was much more adapted by the businesses rather than something else so obviously for their intellectual challenge there was something or the other given but that was always linked to the business problem or the bigger problem that we are trying to solve so that is the way we could take care of the people problem that we are talking about and adoption is obviously very important and that's the way we ensure that option happens from the business side and that's the way we ensure that option happens from the people who are developing it you you don't get it adopted i don't value or the work so that was the bit this is something that went about developing models this is not important for here this is how we develop okay this is an important aspect that i need to talk about if the rightmost bottom corner models developed with around 10 variables why this concept of 10 variables this is something very important that i felt is important for me to convince my it counterpart that if i give a solution which has got 30 40 variables my model is very happy because he says that i have done this that transformation this transformation and i have been able to give that extra lift in the model fantastic very good but they being young energetic enthusiastic team members they do not appreciate the fact that getting a model with 30 40 variables when the model is so very complex it becomes very difficult to deploy and because it's much more difficult to make it run on a daily basis so the one line message to them was that you develop something it needs to be deployed it shouldn't be so happening that it come back and says that no no we can't deploy it so the models and i found very categorically those extra 15 20 variables that come and use if i remove all those and keep the model to a basic 10 12 15 variables doesn't make a lot of performance difference and even if the performance difference is analytical it doesn't make a lot of difference in the overall revenue terms so if the model that you have developed might be very complex very interesting a very cutting edge but is not able to give a performance left in terms of your model performance and furthermore it is not able to be transferred into a revenue performance then we don't value so what i'm trying to tell you is that as an analytics fraternity we get excited by using a lot of interesting data we get excited by something that i have read in some journal of norway and getting implemented in my own organization but if it is not delivering the value that is supposed to deliver it's very difficult for people like us to accept it and please bear with it so that's where the the the completion of the game slightly changes and i know when i am on this side of the table i will be saying that i would like to do those fantastic job but it's it's tough for me to say no to those suggestions that these young bright enthusiastic energetic people like you all sparks in their eyes come and suggest i can't take it always reason being at the end i have to see that i have an environment runs and many times i know that they give gully behind my back so this is where i was talking about the deployment deployment is very very critical and has to be and so whole process has been streamlined a model building has been streamlined that means the model building that i talk about this wherein we are converting the variables around 100 odd variables get converted into 700 odd variables that process is streamlined so the codes are defined is in the system so it is all that way taken into picture and then ensure that it is you can't change very often now you have to change something come and make a proposal in a quarterly update of the model of the whole environment will be incorporating your suggestion so basic for i'm just giving an example of the overall how the cl this is a very very basic rudimentary slide the overall clm that is a customer life cycle management that happens the whole thing works through analytics but there are different stages wherein analytics touches it and obviously the important thing that we always go down to say is that how is that going to be measured how is that you are able to make an impact into the whole thing so that's how the whole piece in a very simple way i have put it over here for it is not so simple obviously when we talk about today in the in the day the unlimited has taken a lot of charm out of telecom products but otherwise when we were running campaigns in the in the mid 2017 18 types there were around 4000 campaigns being run on a monthly basis and 4000 campaigns being run on a monthly basis is not a very easy thing and the campaign management ecosystem had to be geared up to run those so obviously all our analytical modules had to make this course flow into the system had to make the ensure that the communication is in place and things go accordingly so i can leave the this talk and leave it the last five minutes for you guys to ask any questions because i thought it will be interactive but i rather zoom past so these are the different things that i feel was important for our journey and make it a more successful one so i touched upon most of them and and this is for you to read but mostly these are the areas wherein we went about and made it a successful thing so that's that's the way we went about in idea cellular today today as in the last three months we are more working towards ensuring the Vodafone idea two databases coming together and we are more working towards how do we make it a big analytical platform for both the organizations and all those but more importantly for us for me as i have said the whole analytical journey for ideas a little limited and in the telecom domain it was pretty robust and going good so this is where we went about and how this is how we went about they're very short 50 minutes i could capture this much i leave it open for you guys to ask if there is any question so that i can take it up and take it forward elephant in the room geo the elephant in the room geo so what is it has already made the geo let me put in the light away is that it has already ensured many telecom companies are no longer geoing so and many big telecom companies had to be forced to merge and we are also one of them so it is an impact but it is a move from the telecom perspective but the way the telecom has changed i mean to say the per minute billing going to per second billing going to and now making it voice free making the unlimited coming into picture is obviously a big change in the telecom market so the learning for analytics over here is that we might be creating a lot of assets analytical assets and take pride in it but one market disruption creates a flux and thereby washes off all the different analytical assets that you have created and geo has ensured that all the analytical assets that we had cleared created over the last so many years had got flushed out and we are now running very fast to ensure that we create our new assets fast so that's one impact on analytics that i can comment on yeah so i actually have two questions so the first one is so considering the board of one idea margin so how the new analytics your entire architecture shapes up so i'm assuming idea had something else board of hands of thing else right so they might be following some difference so how the new architecture so can you share some like okay the thing is at this one line it has got a huge implication the cost has got a huge implication on the decision of the architecture we would not like to go in day one itself and that is the way we do not go ahead and try to create a new architecture a new warehouse and ensure that the data starts pumping in so that is the first thing we said no we are not going to do as i said my first thing is that it has to be a robust and long long run thing so the way we are now going ahead all the all the reporting requirements of the organization are being individually done by the two data warehouses and this being taken forward so at a definition level you can take care of both the things and then you can bring it in one platform and then can start reporting over there but when it comes to the analytical problems the analytical problems is initially being taken care by when you have independent models running on the two data warehouses but going forward the way it is going to be it has going to be going to be where in the data from the different source systems i mean it's a network and it is so systems is going to come to the big data ecosystem and and we're going to have to bring them to the same schema same architecture not today not tomorrow maybe it has to eventually happen but when we are hard pressed on network investments so these backs it so it's like architecture etl and can you give one example like how customizable impacted your business decision already followed so our whole of the campaign management ecosystem the whole of the campaign management ecosystem and i'm running 4 000 campaigns they were unicas of the world which is pretty well established and they were running those campaigns in the international globally and they were really nice no complaining as a solution but what was problem was that our segment managers wanted to do something differently they wanted dynamic campaigns that means i create a static campaign and then i get triggers in between one subscriber has to be moved out of this and made a dynamic campaign now this is a very simple thing in your and my mind but many of that forms do not have the capacity to do it even if they are able to do it then they are able to not tag the revenue earning from that subscriber from that day when the movement happened to this campaign it will be going to a previous the small nitty gritties ensured that we move to a customizable solution and the big differential that the quarter on quarter of the growth that idea had made these iris as a campaign management ecosystem that we had put up iris by the way full form is increasing the revenue through intelligence of segmentation so it is so the whole thing worked really well for that those quarters when this customized solution was in full blow going and yeah so my questions are around the question is around what you i heard earlier was that you were talking about models being robust and change if a change had to be allowed you have to come back come in with the proposal they'll take it up in the the quarterly review and deploy it and now that there is an elephant in the room and you say you have to move quickly so how does that change so does that have an impact on how your engineering works or does that have an impact on how your data science now to answer the question is a nice question the thing is though we are saying we have these are our models which is which is going to be developed and quarterly time point of time i'm saying that quarterly review may will be taking care of all the proposals and then making the process change in terms of making the necessary lift on a continuous basis there is a challenger the the champion challenger approach that is always involved so if there is anything that is happening that will be taking care but the recent changes that are happening in the market you have self-learning models self-learning models deployed which is trying to take into consideration the change that is happening in the market and then you accordingly take it up and then run the things so currently we are moving more towards that type of environment rather than going into the traditional environment but more importantly which is to be appreciated the focus of the telecom analytics from customer shown management has substantially shifted nowadays we no longer get into that type of work we moved out of that over the last one year and that is all because of the market chain that has happened it is more into optimization game today the whole area of analytics is happening in the mode of optimization if i have this much resource how do i give targets to the retailer if i have this much resource where do i put my network so all those analytics that has changed is because of the market chain that has happened so the rather than changing the model's approach it is the model's objective being changed right fine anything else fine is there any other question that's fine okay so thank you so much for this powerhouse presentation at 4pm in the evening thanks a lot for accepting our invitation coming over thank you okay next up we have Kaushik Bhatt who will be making a set of remarks on the challenge of data governance from his perspective thanks Zainab i'm still i'm standing in between before your time to leave i know this is the when we were Zainab and i were talking about this topic data governance was the topic we wanted to pick up and i know this is group of data scientists data engineers and talking about data governance here is probably a dry subject here so first of all i'm not going to put any slides i don't want to bore you with the presentation by the way by introduction my name is Kaushik Bhatt i work for Wells Fargo i'm responsible for data management and governance and reason i say governance because it's an area as a bank we have started to focus on and those of you don't know Wells Fargo it's Wells sorry US 4th largest bank in terms of revenue and in world if we are among top 10 banks assets are concerned so let me just quickly talk through you in terms of what is governance what we do and where it is started and how we see this is going to impact the spectrum in India as well and that's where i'm going to drive the topic here so we've been doing data management for last 20 years as a bank we have a data warehouse you can name a technology we have it right from structured data unstructured data data lake environments we have done everything by the way when it came to 2016 you know as a bank we were asked can you give us some numbers from regulators by the way this is where it started and i cannot talk in details but essentially when it came to our analyst community to get one single single truth was difficult and we went to a data scientist even today if you go to data scientist community and ask a question can you get me a good data a lot of time we see that data scientists are spending time in finding good data we saw that the struggle of success of data platform is to do with right data and data in a way where we can be consumed faster so our effort was really to make sure that our data assets that we have are 100 percent consume effectively so in so this is where the program got started i'm part of studio office chip data office our job is to really look into data lifecycle right from data origination to data processing and consumption so right from knowing your data let me give you example all of you and in the entire day i heard people talking about all types of data being ingested you know right from your location data telco we're talking about monetizing for social engineering and all sort of data is getting into a system first thing first is do you know what you're getting into it now as a as a company especially product companies like yours you have a clear use cases which we are bringing in but a lot of time data scientists will bring data for monetization which multiple use cases and they don't really know whether they need it or not so in our case or most banks we would end up creating data lacks lakes and Twitter swamps so we had a lot of data which we didn't even knew why we have it here if you look at that's the first problem our effort 2016 onwards we've been we've been going back to our data we have deployed i would say data scan mechanism so we know whenever we get data we know what data it is it is sensitive if it is PII or if it is confidential we know it right there our endeavor is to really make sure that every policy of the bank or any financial institute is being recorded in the part of scan itself let me give you example if you know your data a lot of times you would want to make sure that your phone number and pankard in Indian context phone number and pankard shouldn't be compromised let's say if you're a fintech you are dealing with adab base encryption adab is validation your pankard details data birth location by the way this is this is super sensitive data and if it is if it is been gone into wrong hands potentially you're going to lose the customer trust and and we all know what happened to facebook and what's what's happening in terms of different companies so as a bank what we decided is we want to make sure that we are responsible not only for our data scientist community but we wanted to go out and tell our customers we are responsible in terms of how we handle our data three things we were we deployed our data like data like architecture with profiling systems we also deployed scalable data classifications so if i get a data which talks to me as 16 16 number i know it's a great card number if i get a data which is starting with the four digit number which is mx i know it's mx card if i know date of birth i would know that this is date of birth and address with or a person where it's aside so right from ingestion we deployed our data classification and tagging we also made sure that some if you know if you know amazon the way he does cataloging and in fact i'm sure in your data platform you have deployed it we made sure that every data asset it properly cataloged so we deployed our AI ML algorithm there we are deployed on products and we make sure that every data assets that we're getting in our system it's getting cataloged properly and we make sure that every data asset being transformed from ingestion to assumption we have mdm system lineage so that's the other thing that we have invested on last but not least data protection so whenever you find a date whenever we find a bank i mean SSN number in our US perspective or address number we make sure data is masked and there are a couple of things that we do we do tokenization we do encryption we also do in some cases we do masking again these are different techniques of protection each technique has its own merits and costs associated with it we ensure that if at all you have access to data the role demands it we make sure that business is owning this policies we make sure that there's a steward who's assigned to the policy and makes and who gets access to it for the purpose and the purpose is defined and it's it's part of our policy again this whole life cycle is looking like a barrier to data scientist community if you ask me whether it is going to slow us down but you know our experience has not to really slow it down our experience has to really make it better because if you address the quality of data if you address the data itself is properly cataloged your search mechanism for data scientists become easier we saw a benefit of doing this because we were able to get our data as it's much easier we were able to find where to get the right data for our analytical projects also we were able to make sure that we are responsible in the way we handle it so that's what we have done i really cannot go specifics of products but i know we have some if you have questions we can take the questions or we can talk through the specifics yeah we are in we are in Bangalore Chennai Hyderabad so there are a couple of questions first thing is like what about the data retention right so you said like you're going to you have a discovery mechanism to identify the data sets so but what about the heart what's us cold data the storage trade-offs there right and what's they sell it to make the cold data heart into heart yes that's one question and i'll ask my next question so again data retention policies are defined for the bank per se in terms of what we want to keep in our heart data look there are policies around operational data policies around what we keep in our lake and based on our use case we make sure that data is either in hot i mean available to our end consumers or it is kept in a tape backups for retrieval so if your question is how do you really ensure it we make sure that our traditional data is kept in our data lake environment we have our referential data and financial data in the warehouse we keep three your history at any point in time as far as our data was concerned and traditional data is dependent on the use case you know i would not be having specifics but we do have up to a year data or even some cases we keep a month's data for use case well second question is the cost of ingestion right so you have transactional databases and data stores and you have data lakes but what is the cost of ingestion because you are already ingesting into your traditional warehouse warehouses yeah and then again you need to ingest again yeah so we have actually when we invest in data lake our first strategy was to really design data lake with the concept of data raw data store so a lot of our data fields that were traditionally going to the warehouse are now getting fed to raw zone of data lake so