 Good afternoon. My name is Aninde Shankar Day and I work for Walmart at Walmart Labs here. And as the title suggests, I'll be talking about how at Walmart we are using machine learning for solving various business problems with a deep dive into our automated self-learning forecasting system. Walmart is the largest retailer. We are present in 28 countries with approximately 11,500 stores with 5,000 stores in the US alone, having greater than 260 million transactions every week, creating petabytes of data every week which we need to process and make our business decisions. Our founder Sam Walton once said, exceed your customer's expectation. If you do, they will come back over and over. Give them what they want and a little more. And we are following the same footsteps using machine learning. We are finding out what our customers really want. We have millions of customers and so if we try to find out what each and every customer wants, we will have a variety of data because every customer's perspective differs from one another. But to find out collectively what our customers really want, we have to go back and use, process the data we collect on a daily basis to find out what they want. So giving some examples how we are using machine learning and why we are using machine learning. Walmart is a retailer shop. In this world of e-commerce, we still have physical stores. And using machine learning, we are able to find out which market, which city, there is a need among the customers for us to open a new store. In the earlier times of Sam Walton, when they, when we wanted to open a new store in any particular location, he used to go on a flight and from the aerial vantage point that he had, he used to decide where is the optimal location to open a store. But in this digital era, we have data to facilitate that and we have built a system which on demand, when we require it, it gives us back using machine learning that this is the market where there is a need for you to go and open a store. And at this exact location, if you open a store, it will be helpful for Walmart as well as its customers. Not only on real estate, we have a huge presence or a huge machine learning happens at the customer perspective as well, especially on the omnichannel mapping, we have the capability on a real-time basis. Wherever our customer transacts in our offline stores, if he has an account with Walmart.com, immediately after his transaction at the store, within a few seconds, he can map his online profile without knowing anything. You can be a random customer. You just go into a Walmart store and do a transaction and you have a Walmart.com account as well. We have the capability to match you back with your Walmart.com account which will help you from a customer's experience point of view when you go in a Walmart.com site and see your past order. You see not only your online orders but your offline orders as well, giving a much more holistic experience. We have talked about the physical stores of customers and now on the merchandising which people actually come to buy at Walmart where machine learning is happening day in and day out. We have system design which gives you, which tracks what products are available in the market, what Walmart is carrying, what Walmart is not carrying, and then deciding which is the best item that Walmart is currently not carrying and he can carry. So a system gives this recommendation to our buyers. We also kind of find out if a product is not there, is not available, which is the product that is the most suitable substitution and is wanted by the customers. The system recommends them as well. Now Walmart is not an OEM. We does not manufacture anything. We sell products created by OEMs and for that our logistic operation is another huge part. Our supply chain is one of the largest supply chain across and machine learning over there is used at the severest form. Demand forecasting is one of the almost critical work where we focus what will be the sale, what will be the amount of demand for a particular item at a particular store for the next few weeks, which I will go in details after this. We have developed systems which tell us if a particular supplier will not give a product on time, there will be a delay in its lead time so that the stores can prepare alternate products for that point of time. As you know, we have ventured into delivering to the customer's home where we are using machine learning in a huge way to find out the optimal route and optimal way to deliver our product directly to the customer's home so that they can live better. We use standard machine learning algorithms like deep learning. We use graph theory quite a lot. We use clustering. We use optimization techniques like genetic algorithm and many more. More than that, more than often we use home written algorithms. We have scenarios where standard algorithms does not work. We go back to the whiteboard, do theoretical research and build our own proprietary models. Now, the main part of this talk is our forecasting system. Now, before we go into forecasting system, let me talk about traditional machine learning when machines start to learn on its own. Whenever we talk of traditional machine learning with as complex as image processing, facial recognition, any kind of pattern recognition as a whole, or natural language processing where we evaluate our natural language and make computer understand what we speak collectively. Or if we try to build a system that just distinguishes between two type of customers or anything, this is where the traditional machine learning usually revolves around. But there is one thing from the very complex image processing to just a distinguishing algorithm is all of them are classification systems. We have our data in place which has labels and we train algorithm that will be able to classify a new data into those very labels. We do not talk about usually about building a system, building a machine that can forecast the future. We do not talk about forecasting systems. And there is a reason, it's much more complex than a classification system. Just even if you consider very few items in Walmart and check their sales pattern across the weeks, you will see multiple and multiple varieties of patterns emerging. And it's hard to build system that will classify all those systems at once. Even if you just take one item, just one item which have a smooth time series framework, we have the distinguishing, we have to estimate its trend components separately, its cyclical components separately, its seasonality separately, take care of the irregularity part, find out if it's affected by an outside model and then find which is the best model to give best model for this time series to be forecasted. And then only we get the forecast results. So more than often forecasting one time series becomes a huge job, becomes a project. Now imagine, we have 5000 stores in US and every store carries one million item at least. So what are we talking about here is building a system that will do time series forecasting for each of those 5000 into a million items and not just time series forecasting, it will do the forecasting by finding out the best model for each and every time series for all of them at parallel. The project started with our requirement of the reduction of buffer inventory. So that we do not go out of stock, we often have buffer inventories in our stores so that when there is a huge spike in demand, we are able to meet that. But it doesn't happen always that way. It's not always good to create lots and lots of buffer inventory. If we have inventory which is not getting sold, it's actually taking up sales space of items that the customer wants and we are not able to meet the customer's demands because we have inventory of an item which is not getting sold. So we needed to build algorithms that do store item level accurate forecasting as much as possible. And for this reason we collected various internal external data for building our forecasting algorithms. Now people familiar with time series might know that even if you just take the historical sales value of the time series or historical data points or time series using univariate models, it is possible to build good time series models. But consider this scenario. Suppose of particular items, snow shovel, everyone knows that gets sold mostly in the winter season and there is a smooth seasonality for sale of snow shovels at that time frame. And univariate time series can predict that. But suppose there is a snowstorm warning that has been given in a particular region. If there is a snowstorm coming, people will flock to our Walmart stores to buy snow shovels and that will be a huge spike in demand which will lead to an unexpected sale of snow shovels which univariate time series model will not be able to capture. And that is something we try to incorporate into our current system. So we collect various internal and external data sources. We create multiple time series models on top of them. We build our hardware and software capability. So we scale up at a Walmart scale and then choose the best model for each store item combination. I'll be going in the deep dive of the architecture after this. This is where I want to focus on. We doesn't just fit one time series model for each store item combination. For each store item combination, 5,000 into a million item. For each of them, we have a system that builds multiple models and find the optimal model for each of them and then that's the forecasting. And I'll explain to you why I have called this system a self-learning system in the architecture part of it. So at start of any machine learning process, the first step where we start is the data gathering process. So we collect various data from internal external sources using web crawlers where we mostly use Python for web strolling and our database we have is Hadoop. The processing in Hadoop, we mostly use Hive and the statistical processing we mostly use are. Now, once we have all the data, we go into our feature engineering and selection. So we create various features from our data itself. And then we have built an algorithm which automatically selects for this store item combination. These are the features that needs to be included in the model. And these features are not only collected or not only based on the data we have collected. I will come back to this. We also convert our time series into features. There are ways to convert, extract information from the time series data as well and make them non-time series features as well. So once we have this, we build various forecasting models, some of them like dynamic linear model, ARIMA with Fourier seasonality and in this system we have lots of home written algorithms as well. Now, when the system was initially deployed, how the system used to work? For every store item combination, it used to create multiple models. And then based on the historical data it has, it used to choose which is the best model for each time series and do the forecast. I am not talking of the ensembling part because that's where the self-learning mechanism comes up. Now, once the system was initially deployed, once we find which is the best model, the system used to go and collect the actual data when the time comes and does various kind of model validation and analysis. So the system was analyzing that I have forecasted for the next few weeks these are the values for this store item combination and this is actually happened. So we have written algorithms in place which go back and find out, the focus was good during this week when these other variables were involved. The focus has gone down now. These are the things that needs to be there which is not there right now. And it creates feedback for itself which goes back into the system in the next run. So this keeps on updating itself. This feedback goes back in the system saying, we have seen that these variables is not getting picked up by our current model selection algorithm but it is important for this store item combination. So whatever happens, we should be talking about it. We should be getting that variable into the system. This is one part where self-learning works. The other part where the system is self-learning is, at this point of time we have historical data at what time point, what was the best model. And what we are doing now is we are ensembling various models we created and the weights, the weights keeps on updating itself. We decide based on the historical data which was the best model when a model usually performs better. And the weights are continuously self-learning based on the various features we have created, various time points based on the historical best models. So suppose we have five models. We have W1, W2, W3, W4, W5. And historically the first model at a particular time point is better. So the model gives higher time point when a similar situation arrives. As I explained earlier about the snow shovel part, if a model is very good at predicting the snow shovel, that model will always get better weight in the ensembling part of it. Also we can have the capability now to predict what the... So as I mentioned earlier, we have the best models and we have created our time series, features from our time series. So we can actually predict which will be the best model and use that prediction probability as our ensembling part. So this is how the entire system works. And as we go on and on, the system has stabilized and we are able to create better and better forecast as the process go on. So this is what we have on the focusing system architecture. And I am open to questions actually. Hello, interesting talk. So I have a question. In the initial slides you had mentioned about the mapping of the offline data with the online data, right? So how exactly does that work out for a random customer? That's a long process and I think it's better. If you come back to the Walmart booth, there are people who can give some guidance on that. So that's an entirely different system altogether. Any jest you can give of the... We can match the names and sometimes they provide names and all on the POS talk and we have capability of matching the closest match and we have been very much able to find out the closest match to these things. Thank you. I have a question about the validation part that you showed. What are the exact matrices that you... or the matrix that you validate on? Like you said that we compare with the original series, what are these like MAP, FAP or what are the other things? We have decided we use MAP. We also find out... So we take MAP at various instances like 3 week MAP, 12 weeks MAP on the various forecasting depending on... and also something called HIT MAPE kind of scenario. So how close was the forecast? Was it very close? Because MAP is an average. So we also take the count of closeness to the model. The actual forecast. Hi, I'm Vikash and I have questions like... So for different departments you build different things like for... suppose in a city for a store you have a model forecasting some X things for X items, Y items. Then this has to be gone to some warehouse where the supply guys will have their own models or any other stuffs going on. And then how do you integrate it to the complete supply chain and get a feedback that okay it was not given back what we have said has not been done. So how this integration happens with all other departments together for a smooth flow of products into the store? So as I said we do the forecasting at store item level. Now this goes back into our order management system where we have dedicated buyers and the logistic person who takes care of that part of it. So what we give them is what will be the forecast changes and then the people who have... there are people who take these focus values and decide on what to do next. Hi, I just need to quickly interrupt. You can continue with the Q&A. There's like an urgent announcement about cars blocking the cars. So KA05 MK8054 that's blocking another car. KA04 N3424 is blocking another car. And KA03 MV1162. So if any of these cars belong to you please talk to registration desk. Once again KA05 MK8054 KA04 N3424 KA03 MV1162 You can continue with the Q&A. Hi. So I had a question on products which do not have history. Let's say you have a television but for a different brand and this is completely new. So how do you forecast in such cases? I didn't get the question. So let's say you have a television from a completely new player like this leak or something which recently launched in India. I'm taking India's case so that you understand. I don't have history for that brand and so on. So how do you model the time series for that? So for very new items we have a separate system in place which takes care of similarly selling products and uses that approximate those kind of models to approximate what will be the sale of this product because if you for any new product introduction there will be a different hierarchy. It will be like a Gompers curve. So the entire modeling architecture for new products is different. So how do you classify new product in this system? Just the model or is it on the features of the product as well? Is it on the brand or is it on the features of the product? So as I told this system is not designed for new products so that's a totally different system that's in place that works out over there. I have a question. If certain factors are affecting a product and if some factor suddenly drops down or suddenly jumps up then how that impact of that variable will complete your model? Then how will you decrease its effect on the overall model? So before even going into the modeling section all the data preprocessing like outlaw detection and all of that usually happens. So if such a scenario is happening we are smoothing that out. But it is good to completely remove that data point or we have to... Depends. So if it goes beyond thresholds then we have to completely remove that otherwise we can still smooth it out and use it. Hello. Hey, I have a question. You talked about history based model selection. So is it that you are running let's say five different models parallely throughout the year and when let's say winter comes this year you just pick the model that ran based last winter. Is it like that or did I misunderstood? Lot of that. So not only what has happened in the past like at this point this model was the base but we have designed the system that we can actually predict what will be the base model right now. So we can use those probabilities as well. So you predict what's going to perform the base model at this point of time. What methodologies do you use to predict that? Pardon? What methodologies do you use? Classification. That's classification. Okay. Okay. Thank you.