 Hi, everyone, and welcome. Thanks for joining. My name is Callum Betts. I'm a course lead at MITx MicroMasters Supply Chain Management Program, and I'm involved in sustainability research here at the MIT Center for Transportation and Logistics. Our current MITx courses, which I know many of you are part of, are supply chain technology and systems and supply chain design. So welcome, learners. I hope you find today's event a great addition to your experience in these courses. Before we get started, I want to make one announcement for SC4x, our learners in SC4x. Today is a deadline to verify for the course. So if you're considering going for a certificate, make sure you register today. And there's still time to do so today. This is a deadline that we can't change, so please make sure to register today if you're interested in going for that certificate. And with that, let's get started. So today, we are fortunate to have two expert panelists in the use of data optimization, machine learning, and supply chain. Very excited to be your host for today's events. So let's welcome Xifeng Wang and Rao Pan Chalabarapu. If you guys might turn your cameras and audio, please. Hello, everyone. Hey, Kala. Thanks for having me here. This is you. Awesome. So before we get started and do introductions and discussion, let's kick off the event with a quick poll. We'd love to learn more about the audience and why you're here today. Let me just launch that poll here. There we go. So why are you here today? A number of different options. I'll let you take a few minutes to take a look. And we'd love to hear your input. Why are you here today? I want to learn about data in ML. I want to learn about these topics more specific to retail supply chains. So I'll give you a few minutes to fill out our poll. Great. Thank you. And so while you do so, I'm going to talk a little bit about our agenda for today. So for the next few minutes, I'll have our panelists introduce themselves and provide a little background on what they do and how it relates to the topic today. I'll then ask some questions. We have pre-prepared. And the last 15 minutes for the event will be saved for your questions. Please use the webinar Q&A. It's the little button on the bottom, the Q&A button on the bottom to ask questions. And be sure you're logged in with a name. I won't read anonymous questions. And I'll also share a couple more polls during the event. So be prepared to participate. So first, let's check in on our poll here and look at the results. So it looks like the majority want to see how data in ML can improve supply chain performance. That's very interesting. Also just generally expanding their supply chain knowledge. It looks like everyone is interested in most of these different topics. That's great. Awesome to see some MicroMasters SCM students who don't miss any events on here. That's awesome. And we'd love to hear from you in the chat on the left hand or in the chat button on the bottom. Please introduce yourself if you'd like. And again, please use that Q&A feature if you have any questions as we go along. And so with that, let's get started. So Xifeng, can you introduce yourself and share a little bit about your background? Thanks, Kevin. Hey, everyone. Xifeng, last time, I lead operations research team at GameStop. So my area focused on transportation, fulfillment, SOP, and procurement. So anything you name relates to supply chain. So my role here, I'm seven months here with GameStop. I have three major tasks here to solve. Number one is to build a data set and KPIs to monitor and evaluate the system to understand the trade-off between different systems. And also help senior leadership to design the supply chain network. So we have two business-wise e-commerce, the other one is retail stores. So for e-commerce, they have to help business to reduce this order to deliver improved consolidation, which you will hear a lot for e-commerce business. So for retail, I help the business to optimize the replenishment network, drive down all the stock. Last thing, I help the businesses to optimize the existing system and build new algorithms for the future. That's a little background. Awesome. Thank you, Xifeng. So I'm happy to have you joining the discussion today. You know, GameStop is a company that's been in the news a lot this last year or so. And so also very exciting on the chain of retail that's going through some transformation. And so looking forward to hearing about your experience and using data and modeling and assimilation and optimization. It's awesome. Thank you for joining. And then so then Rao, could you introduce yourself and share a little bit about your background as well? Sure, certainly. Thanks, Kilin. And my name is Rao Panchalovarapur. I'm working at Walmart as principal data scientist. And in my role as principal data scientist, I'm responsible for developing a variety of optimization models which involve various strategic, tactical and operational decisions for the company. And prior to joining Walmart, I was there at various other companies like Amazon, Starbucks, Nordstrom. And most of my work experience was with a company in Green Bay, Wisconsin, Snyder Logistics. And there I had an opportunity to work with a variety of customers of Snyder, which includes General Motors Ford and a wide range of companies like Walpole. And in that role, I was primarily responsible for developing supply chain optimization models which involve decisions like where to locate facilities and which customers should be getting their feed from which location and things like that. So that work experience has given me a lot of opportunities to work with a variety of shippers all over the world which includes Asia mostly and also working in Latin America a lot. And of course, in North America, we had lots of customers. So that has given me a lot of opportunities to really understand how global supply chains work. And as part of my work, I also engage myself with some kind of research with other academic institutes. And that's what I've been doing over the years. That's the high level about myself. And if anybody has any questions, I'll be able to answer later, I guess, yeah. Awesome. Thank you, Raul. Thank you for joining. Lots of experience in this area. And I'm looking forward to hearing your discussion on data and some of your experience in simulation and optimization. So awesome. Thank you. So let's launch into some prepared questions we have. So building off for introductions before we dive into kind of specific questions about data machine learning. I wanted to start with a more broad question. I'm starting with you, Raul. What has been your experience and what do you see for the role for data and supply chain strategy, supply chain design, supply chain operations? Just kind of a broad question. I don't know if you have thoughts on that broad area. Sure, sure. I think data, I mean, let's go item by item here. I think we are talking about strategy. We are talking about data. We are talking about, you know, impact, creating impact for the business overall. So I'm talking about data. I think data is very critical. And nowadays, you know, with all the explosion in developments in software and hardware, there's no problem to get data. But I think the issue is, how do we really take advantage of that? And how do we really effectively integrate with our processes within the company? And also, you know, not all data is greatly equal. So there is some parts of the data which doesn't make sense at all because some process, something might have gone wrong somewhere. So it is important to really collect what is going on across the company in any format that is acceptable within the organization. But on top of it, it is also important to make sure that we are collecting it as accurate as possible. You know, for example, if you are moving a load from Seattle to New York and if somebody tells me it costs $5 to do that, not per mile, I'm talking about for the entire load. And there is some problem with the data. So it is important to really make sure that we are collecting right pieces of information also as accurately as possible. So that is part A of the whole story. Then part B of the story, then how we are really taking advantage of this piece of information which we have available within the company or within our business process. So to answer that question, you know, there are a variety of things we can do nowadays. And again, thanks to all the developments in the data, I mean, all the hardware developments as well as software developments, I want to give you one example to the team here. And I have been working on this optimization space close to 25 years. And when I started my first job, to answer a business question like where to open a facility, and you really cannot solve the problem unless you have a heuristic procedure or some kind of software which really takes into consideration all these optimization methods and other things. But, you know, late nineties with all the developments in Ample, Cplex, of course, nowadays, Guru B is the talk of the town. And there have been lots of developments and also the ability to solve this large scale optimization problems has become more practical and meaningful compared to mid-80s and early 90s and all that stuff. So when I joined my first job, I really couldn't solve anything more than something like 100 location, 100 node network or something like that to get a best location solution to the company. But nowadays, you know, I mean, if I have million nodes and if I have to construct, I'm not million nodes, sorry. If I have thousand nodes and if I have to construct a network with million lanes to solve that kind of optimization problem in the real world sense has become much, much easier. And what it was taking like two hours and sometimes I was running over the weekend is possible to solve in a matter of few minutes and if not in seconds. So the central point here is to answer Kelly's question. Yes, we need quality data. We need data in the first place and then we need to make sure the data is quality and usable for the business. And the next and the third step is how do we really take advantage of this data? And with all the developments in computer science and the hardware developments, it is really possible to solve a large scale optimization model in much more quicker time in many cases in real time. And I will leave it there. Thank you, Raya. Lots of great insights to impact there. I'm sure we can dive into a lot of those details. Like some of the tools you mentioned, you know, Jorobi and data quality, I know is a critical issue and so I know I'll have some opportunity to discuss that hopefully today as well as the advances in computer hardware side of things and how that facilitates, I guess, new approaches to problems. That's awesome. So Shufang, I don't know if you had any thoughts on that same question and if you wanted to just maybe build on any of Rao's comments. Yeah, so for me, I read this on news. I think data is the new oil, the crude oil and supply chain is like complex machine. We can run a machine with our OX, we can use electricity, but data we have to extract lubricant from the oil to optimize our system. So with all operational strategy, operational decision, we need data to monitor if we're doing the rising and for the long-term strategy, we have to build on all top of optimization, simulation model to understand if we make some change ABC, what is the impact to our business. So with all the data we have, tons of data we have, we can look at all the detail like all nodes of the supply chain system to understand what a bottleneck is, but where should we increase capacity? And if something will happen from prediction and then we can do some backup plans, especially for the holiday planning, the volume is three, four times more than the regular season. So we need this data to simulate if we pull some trigger in the system, I see in transportation fulfillment, fulfillment centers was the impact and we need this data from analysis, optimization simulation to give a business leaders a good understanding on how to shape the gear. Yeah, those are great insights as well, Shufang. And I love the analogy, or I guess it's analogy of data as a new oil. I think I've heard that a few times as well and the value of data obviously, especially in supply chains and making supply chains run and lubricating supply chains. That's very interesting, I love that. Awesome. So maybe, I know you touched on this a little bit, Rao as well, but maybe if you could build on a little bit more on how the role for data over your career, you see the role for data has evolved over your career. So you mentioned how kind of the hardware side has facilitated new approaches to problems, bigger optimization problems. Do you have any other thoughts on how that's going to be? Sure, I can definitely take a stab at it. Basically, when I started more than 20 years back, our ability to really handle data and those days I'm talking about late 90s to early 2000s, our ability to really handle something like more than a million rows is somewhat considered, because CSV in those days is something like even lower than what we have. I mean, I'm talking about Microsoft Excel CSV and we used to use things like access. Of course, we do have some Microsoft products in those days to handle large scale databases, but I think anything running into few millions was considered as a very, very kind of hard thing to really handle. Now I think if I'm in there, just I don't want to really talk about the complete evaluation, but I think if I look at today, if I get a 50 million rows or something like 100 million rows is no big deal, actually. In fact, I have handled more than 500 million and 600 million rows of data and getting insight into that kind of large databases and also using that kind of large databases to create inputs to the optimization model has become relatively easier with all the tools we have. We're using more of Python and all this kind of tools now compared to 25 years back. I would say I think the ability to handle large scale data has become very possible in the real world. On top of it, with all these developments in the hardware, we are collecting more and more data compared to what we were able to collect in those days. So that is also helping us to really capture specific aspects that are going within the supply chain and also get to the bottom of what is really causing some specific issue, whatever it is. Overall, I think the summary here is, I think our ability to collect process and analyze data has increased significantly from just a few millions to hundreds of millions of rows of data. Yeah, that's very interesting insight. You know, we often hear about high performance, large, big database technologies. We also, I don't often hear about the technologies that facilitate the collection of data because obviously that data has to come from somewhere and so that pipeline is being filled by a whole upstream piece, which is interesting and hopefully maybe we can touch on that. Shifang, I don't know if you had any thoughts on the evolution of data and your role over your career. So totally echo Ross's response on the speed and volume of the current big data set, big data environment, because I remember a couple years ago when I started, we use SQL system and take me hours, days or even weekends to run a massive custom order data for simulation. So now as we use either Google or Microsoft, it take me a minute to get a result. So I can answer the question very quickly to my leadership instead of having them wait for a couple of days and then I figure out, there's a bug over here. So now my response rate is much more faster. But sometime I have to deal with the data issue like that data or garbage data because garbage in garbage, right? And sometimes we don't have enough data to support analysis. So some of the challenge I would call out in this field is number one, the connectivity between different systems. Like, so for example, we are a team of data scientists working collaborate. So I have to both put some of my analysis to my downstream partner, they have to do analysis. So they are, for example, if I dump my data to the transportation team, I have to work with them very closely to help them to smooth their work, like all pull the data to their format they need, right? That's the first thing, the connectivity between different process in a business. And number two is you always likely missing data because lots of time if we have to deal with a third party data, they don't have a good data infrastructure. For example, consolidation centers from a third party, that's very hard. So sometime we have to estimate the data to support our analysis. So these are two additions. That's very interesting. Very interesting insights. Connectivity is a critical piece that we don't often talk about. Maybe kind of building off that idea also, some of your comments about the evolution and maybe connecting it to what we're learning in supply chain technology and systems. We talk about kind of a workflow for data-related projects. So the specific framework we talk about is called the CRISP. And the basic idea is pretty simple. You know, you start with understanding the business and data, you prepare the data, then you do modeling, and then you evaluate the model. And there's obviously lots of feedbacks where you might be starting from the beginning or going back to different steps. I mean, then finally deploy. So maybe Rao, again, starting with you, if you could share a little bit about your experience with kind of the workflow of data-related projects and how it touches on some of these different aspects of cleaning data that you mentioned and, you know, big data and those types of pieces. Yeah, so I mean, I don't engage myself a whole lot on the workflow, but I can talk at a high level. And we have other teams within the company who are actively really working on a day-to-day basis. But basically, you know, the whole process is pretty straightforward and simple. We have to collect data at some point in the supply chain to be based on, you know, there are multiple points actually, not some point, depending on the facilities, you know, we have hundreds of facilities within the network. So each facility is responsible for really capturing what is happening. And we have to integrate all these things at the central place. And after that, the real key thing is, are we really capturing right pieces of, this is a continuous process. The one which I'm talking here is not a one-shot, one-time process. And we realize there are some issues with the data and all the stuff that, I mean, basically once we collect all these things, we are doing this processing portion of it, just to understand if there are any major issues within the dataset which we are collecting, if we needed to change the processes so that we will capture it more accurately. For example, you know, I mean, I can't give you the very specific example, but at a macro level, when we really bring a trailer to a door, and at what time we are opening, I mean, are we really making sure that the person who is there at the trailer door is really capturing the time it puts there, opening the door and capturing the time it puts there, closing the door, that becomes extremely important because that will tell us to some extent, if that is not the only source, what is the time taken for the trailer to unlock? So if we are not creating a process there to force the person who is working there to when they are opening the trailer door versus closing the trailer door, that would really result in a kind of inaccurate dataset. For example, you know, somebody may choose not to really log in that time and somebody may choose to log in that time after some three hours after they did that. So that would naturally provide wrong piece of information saying it took long time to really unload that trailer. So we need to put checks and balances in all the processes and all the stages within the supply chain just to make sure that we are collecting these pieces of information accurately. And once we get there, then we will go through the process of really evaluating and understanding, are we collecting right pieces of data? Of course, that is another strategic level question, but on top of it, are we collecting it as accurately as possible just take the example which I gave you that is the next phase of it. Then I think at that point in time, once this process are done and once this database is completely made available to the teams like data scientists and other analysts within the company, then they realize that there are some issues what they are finding. And then the feedback mechanism goes back and this is a continuous process where we have to really improve upon on a continuous basis. There's no one-shot or one-time solution to this whole process, right? Yeah, very interesting insights. And the data capture, I love your example of the trailer door and when that specific data point is captured based on just kind of a human operating process, it's very interesting insights. And Xu Feng, I don't know if you had any other thoughts on kind of the workflow of data and data related projects. Yeah, for sure. For the modern companies, the BI team, data team is not small, like I think for any company because every team across business like merchandising, transportation, fulfillment, they all need tons of data. So always if the request is very small, it will be on the backlog forever. So if I have some good relationship with a developer, I can get some source data. So first, because the data is too much and I usually have to prioritize what data I need to deliver this to the BI team, have them to build a table for me or grab it from the source table. So always when I get a request from my manager, it's not well-defined. So for example, my manager asked me, can we reduce our all-bound cost? So there are different angles you can solve this problem, right? You can do this from a reduced package, package country with split or reduced OD pair distance origin destination. So when I start, I always take a look what data do I have and what KPI can I use this data to generate, to define what is good, what is bad, what is the trade-off. So as long as I have the data available and then do some validation cleanup, that's the process of my day. And then I do some data cleanup, build some hypothesis, because it's always this type of what-to-be question. What if we do ABC, what's the impact to the business downstream upstream? So I always have some hypothesis and then once I have a perfect data, most of the time it's not perfect and then I build some model. Either it's prediction or optimization model and then I will run some simulation like the uni-level simulation that will involve into a very huge data set but really it's one year or two years of customer order data to capture the shape of business like the product size, customer order behavior shape. So we run the simulation and use the big data to understand. It's similar like to set up the operational KPI by using the new algorithm. And then we understand here's the bottleneck operational challenges, trade-offs between each business units because sometimes if we optimize a KPI, sometimes it will make the other KPI worse. So you have to understand what's the protocol level, right? Do we benefit one process but maybe sacrifice another one? And then once we have this holistic view of a simulation results on different components of the supply chain process, we help the leaders to bring here is all the KPIs based on my simulation results from all the data and then make a decision. And those are great insights. That sounds like very much the similar framework I'm gonna be discussing class, that's awesome. And so I wanna maybe dive into a couple of other details but before we do, I want to launch our second poll here. So let me just load that up. And it's a historical data historical trivia poll. And so the question is, which one of these statements is true? So I'd love to hear your thoughts on history of data and machine learning and which one of these different options is true. And while we give you a few myths to take a look at that poll I'll go into the next question. So there are many tools. I know you've mentioned actually several, Derobi and SQL and some of these others as the discussion has evolved. So there's many different tools that are available for the lifecycle of data from extraction and cleaning preparation and loading and then modeling. And there's commercial off the shelf packages as well as open source software like Python I think you mentioned in our. And so my question is, how have you approached this kind of complex ecosystem of lots of different tools, commercial tools that are open source? How do you approach this just from your career perspective? Do you try to learn all the different tools and then just adapt to your company or do you try to focus on me one? Like focus, just pick Python and focus on one and become an expert in Python. So Rao, how have you kind of approached this? You used to call this problem of the many different tools available over your career. Sure, so great question, Kellen. I would answer this this way. If you are talking about, let's stick to the optimization tools first. I think if you are talking about optimization tools, we have a variety of products in the industry. Of course, Gruby is the state of the art. We do have ample. We do course ample to really the front end to really create the model and all the stuff. And also the Cplex. Of course Cplex and Gruby, they have primarily the same functionality but the ability of Gruby is much more ability to solve last scale optimization problems in reasonable time is far better than Cplex with all the developments they have installed in the recent days. So to answer your question between the optimization space, I mean, I really don't have to really update my skill set or anything like that because the core of this whole thing is optimization background. Once somebody has a good understanding of how an optimization model works in the real life environment in the industry, then whether you move from ample to something else or maybe Python as an interface and call Gruby within Python or do something else, all these things are a matter of few hours and a few days here and there to really learn. There's nothing significant to really worry about that there. So the key here is understanding the core concepts around an optimization process or optimization methodology and being able to be in a position to apply those skills into the industry and this coding and skills and tools they come as a byproduct here. But on the other hand, to answer your question at a different level, if somebody is asked to really create a traditional C++ program to solve this simplex model from scratch, that requires a different kind of skill set and not everybody in the optimization space will be able to really put their head around that. And there is a set of people who are very much into the modeling and analysis and of course they can handle Python and all this stuff and of course any other tool around that. But if you have to really create your branch and bundle algorithm from scratch and then write a C++ or CBC or Java basis or whatever that requires, more programmatic and skills related issue. But in the business world, what happens is we are in a environment where we are quickly turning out reserves and we are taking advantage of all this commercially available software like Cplex, Groovy and all this stuff. So I'm not engaging myself on a day-to-day basis to really write a branch and bundle algorithm or develop a new heuristic around this and that happens more in the academic community and research community. But in the industry, we are not really going into those domains because if I spend my time on those issues, my boss will tell me, this is not going to be productive to the company when there is something available in the industry. Why do you want to really create a branch and bundle algorithm from scratch? So there in the industry, I think the issues are different here. So again, to answer your broader question in terms of the tools that are coming up, yes, I think we are seeing an evolution in the tools and all this stuff. But when it comes to simulation, optimization, all this kind of tools, they don't really bother me. And even if something new comes, I think it's a matter of a, sometimes it's as fast as one or two days and sometimes it will be one or two weeks to get on to it. But there is a new tool that needs to be learned. I think we always need to make that effort because if that is what is the talk of the term, for example, and I'll tell you my own case, I was not using Python before coming to Amazon and other places. In fact, half way through my career in Green Bay. So in those days, the tool set was different, but we quickly learned everybody around me is using Python and I think it's important to really use Python. So more than 15 years back or 10 years back, I think the transition to Python. So there are some times which are really important and people need to keep preoccupated. And if I want to continue to live in my world, I won't be able to solve large scale problems if I don't really, if I want to really take advantage of this 500 million roll database and all the things which I mentioned earlier, I need to be in a position to really use other tools rather than the eliciting to my CSU file. Okay. Yeah, those are great insights. Especially definitely the lines with our perspective here and MITX and spy chain management and micro-message program where that foundational knowledge is so important. And then also just that continuous learning to be able to adapt to the continuous evolution of new tools and technologies that come around. Before we get into the results for our poll, Xu Feng, do you have any thoughts on this topic of just the diversity of tools and becoming an expert in one versus the foundational knowledge in those different areas? Yeah. So echo what Rao just commented. So there are so many softwares. They're on the market. They all do the same thing. So based on English, right? They just change the syntax. So understand a background of the algorithm or the simulation model is very important because the key is now to learn some specific to like a Python or they're all the same, just change the syntax. So the key is to understand how to translate the business ask into a model. So that's the key part. In terms of a competition speed, what system we use is always the same, right? I don't require some of my team members have to use ABC. This all depends on what you prefer to. So I used to work with a Python environment for many years. And then when I enter GameStop, I know that some people they use R. It has a much better data visualization. So I started to learn R. So as long as you know the background of one, two, you should be do the switch over very quickly. You learn from what you need. Yeah, those are great insights. That's awesome. The foundational knowledge and the adaptability to learning new tools, building off your intermediate knowledge from one tool and trying to adapt that to a new tool. That's great insight. So let me share the poll here, hopefully you can see it. So it looks like the majority or most some thought that Arthur Samuel at IBM, who developed the min max algorithm for plain checkers. Seems like most of you thought that was the most. And so this was actually a bit of a trick question that all of these options were actually true. I'm a bit of a history buff. I love history in lots of different areas. And I found some interesting, yes, history facts about data and machine learning that's pretty interesting. So again, trick question, these are all true. So maybe the first early, at least the earliest recorded instance of data analysis back in 1662, quite a long time ago, all the way evolving to now with big data and databases and Python and machine learning and neural networks and all the technologies we have now. So awesome. So with that, I might pivot the conversation a little bit from focus on data and go a little bit into some questions that we have about machine learning. And then we'll, again, we'll save that last 15 minutes of our session today for questions from you and the audience. And so please into those and the Q and A. Chad, I see there's several in there already, but please into those in the Q and A. So with the transition to more machine learning kind of focus topics, machine learning is obviously a powerful tool making predictions based on data. I know there are many applications in supply chain before diving into kind of details of machine learning. And I know that both of you focus also in simulation and optimization as well. But I wanted to start with just kind of a broad question on if you have any, if you have any experience or you have there like one or two different applications of machine learning in supply chain and then maybe this time starting with you, Xifang. I think demand forecasting, this is very important for all business. So almost every business unit they use, like finance, they use it for the company guideline and procurement, right? They need it to buy enough quantity, but they don't, instead of buying too much and sitting there forever. And like all the transportation team needed to estimate what is my volume looks like during peak, because I can negotiate a better rate with the carrier. And the fulfillment team need the volume forecast for labor planning and storage. So each team for the demand forecasting, they are forecasting different granular outputs. Like some just focus on the overall units flow to use it for labor planning. Some to forecast the cubic flow to design capacity, right? And, but it's very challenging already for demand forecasting. So we usually tailor different models for different needs and some business units they can tolerate a higher over forecast. So for example, if I have a demand forecasting to give a procurement, a guideline, like how many units on this skill I have to buy. So forecasting, demand forecasting is one thing. The other thing is how we execute this. Like if I tell the procurement team to see by 10 of these units, but the minimum case pack quantity is 20, I have to round up, right? You have to consider everything execution level, but demand forecasting is, is what I am super interested in. Yeah, no, that's definitely a critical problem. Forecasting is very interesting area as well. I know I probably actually maybe want to try to dive into some of the details there as well, but maybe Rao, if you had any thoughts on one or two applications that you find interesting in split changes. Absolutely. I think I completely agree with you, Feng. The demand forecasting is one of the important aspects for any organization. We are talking about forecasting variety of things, whether you're talking about forecasting labor, forecasting demand and products and forecasting and wide range of things that happens in a company. So that fits into what's called, I think if you go back to the theoretical arena for a minute, that fits into what's called supervised learning and regression analysis and all those things will fit into that. But let's focus for a minute on the other side that is unsupervised learning. Basically what happens is if you don't have a label data, then you get into that kind of issue. But we end up using this clustering approach which fits into this unsupervised learning umbrella. So we use the clustering approach to really cluster products. For example, if I'm engaged in this large scale network optimization study and every company will have hundreds of thousands of products and all that stuff. So how do we really group these products into meaningful number of product groups so that we can use those product groups as an input to the optimization model and we end up doing the cluster analysis a lot and that is another application on top of what we earlier talked about. Yeah, those are both great examples. So the demand forecasting and also unsupervised learning example of clustering in supply chain technology and systems we're studying that right now the difference between supervised and unsupervised learning those are great examples. And so maybe building off of kind of both of your comments on both of those examples how do you see how machine learning fits into kind of the broader approach? Both of you specialize in both also optimization and simulation as well. How does machine learning fit into those type of projects? Obviously demand forecasts is gonna be critical for those projects. So Xifeng maybe how do you see the demand forecasting fit into the broader projects that you're involved in with optimization and simulation and some of the others? Yeah, so I have been involved into different types of demand forecasting. So it's all the demand forecasting model has to be tailored according to the business pattern like the sales pattern. So the GameStop I work here is very different than the traditional e-commerce company because the problem we have is always limitation on the inventory so we cannot guarantee enough inventory. So for example PS5 or Xbox the newer. So for this case, the demand forecasting is not very important because it doesn't matter how many we buy, we can always sell it. So it's very different. So how to fit this into a business question is previously we use a supervised model to forecast the demand. But the problem is if there's a new launch like a new game, right? There's no pattern historical pattern on it. So we couldn't forecast very accurately. Even on a total unit scale like we forecast the company level unit sold if there's a big launch like a new game, a new system we cannot capture it. But if we use the unsupervised model then we lose the historical information to better predict the data. So it's very, so it's depending on how we solve the problem and what the problem is. So one interesting project we use to have is we have a demand forecasting project as skill level to place inventory. So a team member of us know myself he was able to translate this model problem into a very simple one to use the Excel to solve the model because of the variability of the demand we couldn't accurately forecast the model. So then the guy was trying to say let's find a very easy way, estimate the demand which the business can accept. Yeah, that's it. That's a very interesting example for sure. I know we're running up to our time here in our 15 minutes and so I knew, Rao, if you have any other thoughts on how machine learning gets into the broader projects and approaches. Sure, definitely a great person, Kalyan. I think machine learning certainly fits into many of the initiatives in the business environment. Let's talk about one simple example. I'll try to really finish this in two minutes. I see that we are running into a time-connection probably. So basically, let's talk about a very simple problem where you have to really determine where to open new warehouses for a company. Then the question of really getting into the forecast associated with products and all these things will come into play. On top of it, you also needed to really take into consideration what product groups the company is making what products the company is making and how do we really group them into related buckets and all this stuff. So when we are talking about a large scale supply chain optimization model the inputs to that model will come from a machine learning model. Machine learning model essentially helps us to forecast what kind of demand you would anticipate in the next one year, six months, or maybe it's an impact when we talk about network design studies we should be able to really think through as far as five years because you are making these strategic decisions for the company. You really don't want to limit to the forecast for the next six months or one year. So forecasting and supervised learning methods which we talked about all this regression and also they really play a vital role and they are the key inputs the output from those models are the key inputs to any kind of optimization model. But more important as I said the clustering also will help us that is another machine learning model we are talking about. The clustering will help us to really group all these products group all these products into a meaningful number and do all this analysis outside and feed that into the model which I mentioned at the end. And in addition to this machine learning models are really helpful to understand and give you another simple example how customers are really ordering products and can we really or when somebody orders milk are they ordering butter and bread and because in the e-commerce environment what happens is what constitutes a package what is there in the package will determine the size or the weight and also the eventual transportation process associated with that. So in order to get to the bottom of this kind of analysis we can use clustering and other machine learning models there. Those are great insights especially the concept of machine learning prediction as a key input to like an optimization model. Yes. Awesome. So before we launch into some questions from you and the audience I wanted to launch our third poll here so let me just load that up. So our third poll would love to hear your thoughts on what was the most interesting part of the session today and while I give you a chance to answer that poll I'm going to pick a few questions here from the Q&A and so it looks like this is actually a great question from Faran and maybe I'll start with you, Xifeng, if we could. Data quality is obviously a critical area and so his question is what techniques are used to kind of to clean low quality, polluted historical data. Do you have any thoughts on techniques that are used to clean data? So there's no specific techniques so set up a KPI, use your data. So from a data operations from that KPI you will see if there's a data error. Usually when we launch a new program we use the KPI to monitor. Is this a system issue or do we capture the raw data? Use KPI, key metrics. That's been interesting insight. So it's kind of starting from the end and working backwards to identify issues downstream or upstream I guess. Interesting, Rao, do you have any other thoughts on what techniques you have experience with to clean data? No. You know, I ended up using some commercially available tools I don't have the names on top of my head but Python is a better tool to really analyze and understand what are the issues in the data. I mean, I'll give you one the simplest example which anybody will be able to see. In the US we do have this five-digit GIP code but when you go to Vermont and other North Eastern states they start with zero and if we really call that field as a number the zero is taken away and that results in a GIP which is invalid. So cleaning that kind of things is very straightforward. So, you know, the important point I'm trying to make here is you need to have knowledge and understanding behind the processor, behind the data which you are using and if you don't have that you cannot really bring a commercially available tool and try to really clean your data. So for example, if the tool doesn't realize that the US has this kind of five-digit GIPs where the first digit could be zero in some cases there's no way anybody could help. So I have seen some commercially available tools not satisfying our requirements in the past. So as a result, you know, my understanding and belief based on whatever I've worked in the last several years is we need to have a complete understanding and the process data behind. Suppose if somebody is giving me the transportation cost to move a package from Seattle to New York he's only 50 cents. And that tells me there's something wrong with the data. So if somebody is not familiar with US transportation network and costs associated with that they may buy that timber and then they may use that in their analysis. So the key point I'm trying to make here is you need to be knowledgeable about the business you need to be knowledgeable about the process you need to be knowledgeable about the geography and all that stuff. I'll throw another simple example and a long time back I was given a supply chain network optimization study in India and I know the geography in India very well and all the stuff there easily. Then they gave me a location saying this is what it is coming from. In reality, the data is pointing to somewhat a different location and there's a problem with the geo coding part of the data and that is what is causing that. And if you are knowledgeable about the geography if you are knowledgeable about your business process if you are knowledgeable about other things I think that helps in cleaning the data and Python is the central tool which we used to do this. That's great insight. And maybe the limitations sometimes on technology and how critical that business knowledge and business intuition is to solve or to help address issues with data quality. That's great insight. So let me close our poll here and share the results before we launch into our next question. Thank you for all for participating in our polls today. It looks like many of you thought expanding my knowledge on data and supply chain data and machine learning and supply chain as well as understanding the impact of data and machine learning and supply chain functions. That's great. I'm learning about some specific applications. That's great. So thank you for sharing your insight. So let me pick another question here. It looks like we have a maybe an interesting question. I know COVID has kind of been with us for quite a while now and an important topic but so one question on COVID is supply chain. Let me actually hold this up here. I can't find the name sorry but supply chain was critical affected by COVID-19. And so the question is how do optimization models and data help organizations to recover from COVID-19? And then kind of building on that, how do you use the last two years of COVID impact to then me forecasting future demand? Obviously the last two years have been quite a unique experience for all of us in a number of ways. So the two questions, how did you use optimization and machine learning and data to solve and to adapt and to respond to COVID? And then how is that going to impact going forward? So maybe Raul Sarvati this time if you have any thoughts on that question. So COVID has come into discussions in our projects and to answer this question, it is not going to change my optimization model. It's not going to change any basics that are associated with the machine learning model. But the key thing that it is going to change is how we are going to look at the demand for cash for the coming periods. For example, COVID might have spiked the demand for the particular product or might have completely eliminated demand for the particular product. So basically to answer your question in short, it is going to impact the way how we forecast demand for the products. But in that output, whatever that comes anyway is going to be the input to the optimization model. So the optimization part of the analysis is going to be pretty stable and pretty much the same. But what we are going to feed into this modeling process in terms of the future forecast, future demand that is where it is going to happen. And again, talking about the methodology associated with that, at least to the best of my knowledge, I haven't seen people running a entirely different kind of forecasting process, but I think it requires manual intervention sometimes. The reason manual intervention is important is if you really input to any kind of machine learning model saying this is what I did in the last five years for this product and it will tell us, okay, next year we are going to do this much, but that is not the case in the presence of COVID either the demand has gone up or gone down. And that is where manual intervention is really helpful to really clean up the data. Whether you're talking about machine learning or optimization, at the end of the day, I think the role of any business person is to really make sure the solutions you are giving to the per-execution and implementation should be going through one more passive analysis which requires a manual intervention just to understand that we are giving something that is more practical to the company. That's a great insight. The concept of human in the loop. The important machine learning and algorithms are only so powerful and something like COVID and we might need that human in the loop again to help address that, that's great. So as you're following on, if you have any thoughts on how you use your machine learning or data and optimization to adapt to COVID, to respond to COVID or how it might impact going forward. And so I fully agree with what Roy just mentioned. So there's no change on methodology modeling itself. The data is the same, the model is the same, but there's a change, very big shift on the customer order behavior like online orders three times more and then your retail store order is let's say 50% less. So we have to adapt this volume shift between different channels. Using the traditional model, it doesn't tell us where the business will go. But like what Roy mentioned, we have to use human intervention, which is very important to get guidelines on the models. So with the COVID, we have like a very bad shortage on labor in the DC everywhere. We have very big bottleneck in the supply chain like transportation capacity. So we have to better predict what is our volume and we have to deliver this number to a certain party to help them to design their capacity. So it requires us to provide a more accurate on what the volume looks like, but it's more challenging. Yeah, those are great insights. Definitely been an interesting two years and so having the human in the loop makes a lot of sense and having that foundational knowledge and the experience in the business side of things to help address some of these challenges makes a lot of sense. And so maybe if we probably have time for one more question. So Edwin has a great question here, which is kind of talking about how data is seen from a leadership perspective. And his question is how frequently a senior leadership asks for data support in operational decisions. And so maybe just kind of broadening that a little bit. How do you translate some of your day to day work of building the optimization model to some of the strategic or goals that your senior leadership has. And so maybe she found out if you want to start out with some thoughts on that question. So senior leadership every day they look at weekly business review, daily business review to track if your business is healthy. And then if there's some TTI have a big change then either it's operational mistake we made or there's some data issue. So for a long-term strategy, we have to use the data to run simulation or optimization models to run different scenarios like we all did. So if I look in my facility in this location versus another location. So in terms of just location, there are different definitions, customer best location, capacity best location, right? It's a labor chaper. Lease is much cheaper per square feet or there's a transportation optimal location. So there are different data we need to support the same analysis. By the end of day, we have to bring all the data together to run like different scenarios to create, to mimic if we operate this way, these are the KPIs will look like. So we provide very high aggregate level of operational KPIs. We don't just bring all the data to leadership. Yeah, that makes sense. You don't want to send them the SQL query or the database. You got to bring the insights. Very interesting project. So Rao, I don't have any thoughts on kind of leadership. Sure, sure. So when we interact with senior leadership, the thing I have noticed, I mean, most of my projects, they fit into the strategic network design kind of stuff. So I had several opportunities to work with senior leaders in various companies. So when I go ahead to when talking to senior leader, my first step is to understand, does this guy has any kind of analytical optimization or any kind of background? If not, I think the responsibility lies on my head to translate, of course, even if the person is completely in the optimization space, they may not have time to think on a day-to-day basis about the methodologies and the process which we use. So to give you a short answer here, I think it is important to make sure that we speak their language and give the answer in a way that the business can really understand in plain simple English terms rather than using super complex terminology, which is, I don't want to talk about branch and bound or simplex algorithm or something like that when I go and sit in front of my senior leadership, even if it means that person has a PhD in operations research, I think that's not what it is. So to answer your question at a high level, I think it is important to make sure that we are staying focused and staying focused in relation to the business course and translating everything what the model is doing in terms of plain simple English rather than saying we did some sensitivity analysis, what is sensitivity analysis that they get hung up there. So instead of saying sensitivity analysis, I use the word, I change the numbers to the future demand or whatever they know, however way you are running that sensitivity, you use that simple English rather than using the word sensitivity analysis. So the terminology and jargon which you use while interacting with senior leadership plays a vital role because if they don't understand your language, then they won't have anybody and you may have a very valid point and you may have a very valid set of numbers, but it won't be convincing to them. So the communication is the key aspect there. I just want to add one point here. Absolutely. We have to speak a language, the senior leadership speak, because I have an experience when I first started, I have a meeting with my CEO and then I bring a very complex model. And then when I bring to that page, some people they call their cell phone, they don't interest the end. So like when I get more experienced, I leave all the models in a tendency, but I just generate these are the key matrices for that model. So speak their language is very important. That's great insight, speaking their language and translating those kind of complex details of data and the models to the language that they speak. That's great insight. I wish we had more time. I know there's a ton of questions here in the Q&A and I wish we had another hour to go to all these questions, but we are all running up against the hour. So I want to start by thanking our guests. Xifeng and Rao for sharing your insight, sharing your experience and your knowledge today. I very much appreciate your time. And I also want to share my screen here briefly and just take a chance to mention. So this webinar is part of a series that we are doing here with MIT MicroMasters Program as fight chain management. And next week we have our second event in the series, the changing landscape on the channel retail on fulfillment next Wednesday. So I hope you all can join that one. We will, I believe, hopefully we can, we'll put a link here. I'll just paste it right here into the chat function to register for that event. I would encourage you to attend if you have a chance. And then I also want to thank all of you for participating today in our polls and offering all your insights and questions. And remember again, for those who are in SC4X, play chain technology and systems, today is a deadline to register for verification to get that certificate. I'm sorry to encourage you to do so if you haven't done so already, did you so after this event? I mean, again, thank you to Rao and to Xifeng. I don't know if you have any last thoughts, but thank you again for your time today. Thanks, Kaila. Good luck to all of your students and all of your staff there. Yeah, thank you both and thank you everyone and take care.