 Dear participants, welcome to the course on supply chain digitization. It is jointly being taught by Professor Priyanka Verma, Professor Sushmita Narayana and Professor Dev Prathadas from Indian Institute of Management, Mumbai. So, this is lecture 8 of module 3 that is analytics in supply chain management. So, in the last lecture, we have developed this regression tree and we explain this tree to you. So, in this lecture, first part will give you the data of few more retail stores and then we will try to find out what is the expected demand from these retail stores. So, now let us start with retailer A. So, retailer A which is located in west region, balanced credit is 10 lakh rupees and then urban. So, it is located in urban area, age 12 years, size of the store is 8000 square feet. Promotion offer was given during that time period and we had 3 holidays, like can I predict the order quantity from this retailer that is demand for this retailer. So, how can I do this? Yes, I can. So, as you have to see these 4 rule. So, first I have to check store size. The store size is 8000 square feet which is falling here either in node 3 or in node 4, both the node I have. So, if I show the bigger picture, in both the node 3 and node 4, the size of the store is less than equal to 30.5000 square feet which is matching our criteria. Then the promotion offer was given that is 1. So, promotion offer was given. So, it is matching this promotion is equal to 1, size is less than equal to 30.5000. So, the predicted demand will be 2360. Now, can you find out the predicted demand of retailer B? Yes, I can. So, first I have to see the size of the store. Size of the store in this case is 33000 square feet. So, obviously, size is big. So, that means it is either in node 5 or in node 6. So, it will be in one of the node. Then next I have to see age. Age is 23. Age is 23 means this node. So, retailer B will fall here. So, retailer B will fall here and the predicted demand is 8227. So, from this retailer I would predict that they will place an order of 8227 unit. So, now if I show you the bigger regress entry, size is more than 30.5000 square feet. So, I will go to node 2. After going to node 2, I have to see age. In this case, age is more than 17.5 for retailer B. So, therefore, my prediction is 8227. Now, can you say or can you predict what is the demand of retailer C, whose characteristic is like this. Located in north, balanced credit is 3 lakh rupees. Semi urban area, age 12 years old, size 20 thousand square feet. Promotional offer was given by the retailer. I have like 2 holidays. So, what would be my demand? So, first I have to check size of the store. Size of the store is 20. That means either node 3 or node 4. Then if I am in node 3 and node 4, next parameter I have to see promotion. Promotion was given. So, I am into this. So, it is C. So, the predicted value will be 2360. So, for this kind of retailer, who is located in north region, balanced of the credit is 3000, like 3 lakh rupees. Semi urban area, 12 years of age. That means like last 12 years, the retail store is running. So, size of the store is 20000 square feet. Promotional offer was given. 2 holidays I have during this week. Then I would predict the demand will be 2360. So, if I see it in a large decision tree, the complete decision tree, size is 20000 square feet, which is less than 30 point less than equal to 30.5000 square feet and promotion offer was given. So, I am here 2360. Now, I will do for one more. So, let us say there is another retailer called retailer D, which is having this kind of characteristic. So, can I predict the demand? Of course, we can with this data. So, first I have to see size. Size is 12 years. That means less than equal to 30.5. So, either I am in node 3 or I am in node 4. Then I have to see the promotion was not given. So, I am here. So, for D I am here. So, the predicted value is 943. And A would also fall here. So, A and D, A and C. So, A and C will fall here. D is here and B is here. Similarly, I can have any retailer. So, you can give me the data of any retailer. And you need to give me the value of these 7 parameters. Then I can predict what would be the demand of this retailer using this regression tree. So, I can predict the demand using this regression tree. Now, the question is how this tree was built at the beginning? Like what is the kind of algorithm runs behind? We have seen classification tree in this module only. So, regression tree has lot of similarity with the classification tree. So, only there will be few minor changes here and there. So, we will explain that. There are lot of similarity. So, now let us understand the steps of building a regression tree model. So, this is the whole regression tree model which we have seen now. Now, the question is how this model was built? Why the node 0 was split by the parameter size of the store? Why node 1 was split using promotion? Why node 2 was split using age? There are 7 variables, but why these 3 variables got importance? Now, if age is concerned, why the cutoff value is 17.5? Why not 10? Why not 20? Why not 18? Why not 17.4? So, why this particular value of the age is used for splitting node 2? Similarly, like size of the store, I can take the cutoff value 20,000 square feet, 10,000 square feet, but why 30.5,000 square feet is chosen? So, all of these will be answered once we see the step. So, the first step of the regression tree is I need to start with the complete training data in the root node. In our case, root node is node 0. So, I have how many observations here? I have all 700 training data I have. So, complete training data I am taking it and starting the model. The next step which is very crucial, I split the root node using a predictor variable which are also called independent variable. So, that it results in maximum reduction in mean squared error. So, if you remember the classification tree, in that case we used to split the node using a predictor variable. So, that it results in maximum reduction in impurity. So, in place of MSE, we used impurity, we used gene index or entropy. So, in this case since our dependent variable is continuous in nature, we cannot use gene index or entropy. We are using mean squared error. So, that is one difference from classification tree. In classification tree, we used gene or entropy to find out which predictor variable should I use to split the node. So, that it results in maximum reduction in impurity. In classification tree, my objective is like reduction of impurity. In regression tree, my objective is reduction in MSE, in that is mean squared error. Is that clear? Now in step 3, I need to repeat the step 2. So, once I use step 2, I will get to know like for which node, which predictor variable will be used to split the node. Like if I go back, so node 0, I use size of the store as a predictor variable and that to 30.5 as a cutoff value. Then in node 1, I use promotion to split the node. In node 2, I will get used age and the cutoff value was 17.5. So, this is how we split the node. Now every step like I have to repeat step 2 every node. For each internal node using independent variables that is predictor variables until the stopping criteria is made. So, now in this case, you can see like we have total 6 node. Node 0 was split using size of the store. Cutoff value was 30.5. Then node 1 was split using promotion. Promotion is categorical variable. So, I have 0, 1 and node 2 was split using age cutoff value 17.5. That is how I got node 3, node 4, node 5 and node 6. I can go down further, but we have stopped it over here. So, why did I stop here? What are the general stopping criteria? We will discuss now. So, there are few stopping criteria. The first stopping criteria is level of the tree from the root node. Like how much depth you want to go? So, in this case we have gone up to level 2. So, this is level 1. In level 1, I have node 1, node 2. In level 2, I have node 3, 4, 5 and 6. So, I have gone up to this depth. So, the depth of the recession tree is 2. I can go up to one more level, then the depth of the recession tree will be 3. I can go another level depth 4, depth 5 and so on. And as you are seeing, if we increase the depth of the recession tree, then what would happen? My number of nodes will increase and the prediction will be better for the training data. I cannot comment as of now for the test data, but as you can see for the training data, my prediction will be better if we keep on increasing the depth of the decision tree. In this case regression tree, but what will happen in the test data? That we will see like in one of the future slide. Another stopping criteria is minimum number of observations in each node. Like if you see here, node 3, I have 198, 28 percent observation, node 4, I have 459 percent observation, good amount of support I have, but if you see node 5 and 6, 8 percent and 5 percent. So, now as a decision maker or model builder, you have to take a decision. Like, should I split this node further? Should I split this node further or not? So, in this case as of now I have only 5 percent. If I split this node, obviously in these 2 nodes, my observations will reduce. So, if the number of observations reduce, then my support also will be reducing, then I will have doubt in my prediction. I will get some predicted value, but the question is whether this predicted value is accurate or not. In the test data, whether this predicted value will be working fine or not. So, therefore, we have to stop somewhere. We should not keep on splitting the node. We should not keep on increasing the depth of the decision tree. If we increase the depth of the decision tree for training data, my accuracy will keep on increasing. My emissive value will reduce, but in the test data, I might have something different. So, therefore, I have to use like minimum number of observations in each node and I should not split it further. So, therefore, if I see that minimum 5 percent observations are there, you can decide the rule that you will not go below 5 percent. So, I will not split this node. If you decide that I will not go below 10 percent, you can also like stop over there. So, if I keep on increasing the depth, my number of observations will reduce. That might impact my accuracy in that test data. So, therefore, that is the second level of stopping criteria. Second stopping criteria, you can say the first was level of the tree. Like if you see node 4, I have huge number of observations, 444. So, therefore, I can easily split this node because after splitting also, I would have good number of observation in both the children node. Similarly, I can further split this node also because I will also have good number of observation in the children node. So, wherever you will see like you have good number of observations, you can split the node. But if number of observations are less, like less than 5 percent, less than 10 percent, you have to be very careful and you have to use stopping criteria to stop and you should not split the node further. Then I have another stopping criteria is if the largest decrease in MSE would be less than some threshold. So, I will explain this in detail. Now, if you see node 0, I have all the 700 training data and my demand is 2270 and my MSE mean squared error is 8151813. Now, if I split this node further, I have to use that particular predictor variable, that particular independent variable. So, that is results in maximum reduction in MSE. Now, in this case, if I use size of the store and the cutoff value 30.5, then I will be able to get maximum reduction in MSE from node 0 to node 1 and node 2. In place of size of the store and the cutoff value 30.5, if I use any other independent variable, if I use any other cutoff value, then my reduction in MSE will not be as much. So, this particular combination size of the store as independent variable and size 30.5000 square feet, that cutoff value will give me the maximum reduction in MSE. Now, if you keep on splitting the node, then I have some threshold, I can use some threshold, let us say delta, if the largest decrease in MSE. So, from here to this two node, I have got some reduction in MSE, the largest decrease in MSE, if it is less than the threshold value, then I would not split the node. So, I would not split the node or I will stop splitting the node, if the largest decrease in MSE would be less than some threshold. So, you have to decide the threshold, you can decide the threshold as delta in the model and you specify that, if the largest decrease in MSE is less than that threshold delta, then I have not split it further. These are three important stopping criteria for regress entry. Now, with this stopping criteria, you know when to stop that is regress entry and so on. Now, but still I think one of the concept is not explained in detail MSE, like how this MSE value is calculated. So, in classification entry, it was the Gini index or entropy, in this case it is MSE. So, now how MSE is calculated? We will use the data, which we have used to develop the model and explain the MSE concept. So, I have 700 observation, you can see in the training data and for each 700 observation, I have the value of 7 independent variable 1, 2, 3, 4, 5, then this is like 1, 6, the region and 7. So, I have 7 independent variables and their value and I have dependent variable order quantity. Now, I have 700 observation. So, the best predicted value is their average. So, if I take the average of all the 700 observation, it will turn out to be 2270. Now, this you can write it as y bar and I have y 1, y 2, y 3 dot, dot, dot, y 700. So, I have y 1, y 2, y 3 dot, dot, dot, y 700. So, I have 700. These are my actual order quantity and this is my predicted value. So, 2270 that is y bar is my predicted value. The actual value is y 1, y 2, y 3 dot, dot, y 700. Now, what I have to do next step? I will calculate squared error. What is the error? y i minus y bar that is the error. Then I am taking the square of it and then summing it over i, i equal to 1 to 700. So, that value I will get, I will sum it and then take the average. So, I have to first. So, this is 1, 6, 3, 7, 6, 6, 9. How I am getting this value? I am getting it 990, 990 minus 2270 square. I will get 1, 6, 3, 7, 6, 6, 6, 9. Similarly, I will get 780 minus 2270 square. I will get 221949. So, I will sum it up all this value like from i equal to 1 to 700 and then take the average. That is 700 divided by 700. So, that value will be same as 8151813. That is why it called mean squared error. So, mean nothing but average. So, in this case, I am averaging it over 700 observation divided by 700. You can see squared error y minus y bar square. So, that is how this m s is calculated. So, is it clear now? So, if I see the same value, you will get it in the regression tree also, 8151813. That is at the node 0 in which I have all 700 observation and there is no information about the retailer. So, the best predicted value is nothing but the average value. Now, let us see how this m s is calculated and how this m s is calculated. So, I have split at the node using size of the store. Size less than equal to 30.5. Size greater than 30.5. So, first I will take this data. Size less than equal to 30.5. How many observations I have? 600 to an observation and predicted demand is 1902. So, let us see how do we get this data? So, now these are the data set 612 for which size is less than equal to 30.5,000 square feet. You can check here 612 observation size less than equal to 30.5. Now, I have all the information. Variable 1, variable 2, variable 3, variable 4, variable 5, this is my variable 6, this is my variable 7 and I have their actual order quantity. So, in this case if I write I have y 1, y 2 dot, dot, dot, y 612. This is my actual order quantity and what is my predicted value? Predictive value is nothing but average of this 600 square observation which is 1902. So, this is in this case y bar. So, what is the next step? I will find out the error. What is error? In the first case error is y 1 minus y bar, second y 2 minus y bar and so on. So, if I just summarize the error will be y i minus y bar. So, how 83, 1, 2, 2, 5 is calculated? First I will subtract 990 minus 1902. So, 990 minus 1902 that is y minus y bar then this is square. So, square. So, if I calculate this square error I will get this value. So, how do I get 1, 2, 5, 8, 2 forces? Nothing but 780 minus 1902 square and so on. So, I will get y i minus y bar square for all 612 observations. Then I have to sum it up. I have to sum it up over i equal to 1, 2, 612. So, that is called sum of square error. Then I have to divide it with number of derivations 612. So, if you do this you will end up getting 6605698 as mean square error. Same thing you will see 6605698 as a mean square error. Now, how do we get 4829 and this value is m s e? I have to see the data for which size is more than 30.5000 square feet. So, now let us see this table. I have 88 observation with size more than 30.5000 square feet from the training data. I have all of their information independent variable 1, 2, 3, 4, 5, independent variable 6 and independent variable 7 and I have the value of their actual demand that is actual order quantity. So, I have let us say I can write it y 1, y 2, dot, dot, dot, y 88 and then average of this value is y bar which is 4829. So, what is the error? y 1 minus y bar. So, 8250 minus 4829 that is my error, but I am taking square error. So, it will be y i minus y bar square. So, first one will be y 1 minus y bar square I will get 1170397. Similarly, if I do 7560 minus 4829 square I will get 7458485 and so on. So, all this square error is calculated then I have to sum it up i equal to 12 how much 88 and divided by 88. So, if I do this I will get 11412707 as my MSE why because I am first doing square error summing it up then averaging it. So, I am getting 11412707. So, that is how like MSE is calculated and if I show you the regress entry MSE is also same value. So, every node we have to calculate the mean squared error you have to calculate the average demand with that characteristics that average demand will be my predicted demand. I should use MSE parameter to find out like in which node which predictor variable should I use. So, that it results in maximum reduction in MSE and that variable I should brought in. So, therefore, MSE plays a very important role in regress entry. In classification entry we saw Gini and entropy play a very important role to split the role split the node here in regress entry MSE plays an important role. So, you saw all of these steps now similarly before a summary. So, we can like same way you can get the data like size less than equal to 30.5 promotion 0 use the same data you will be able to find out average demand 943 MSE same size less than equal to 30.5 promotion 1 you will have 414 observation you can find out MSE and 2360 as average demand. So, node 5 and node 6. So, node 3, 4, 5 and 6 I am not showing you the calculation the exactly same steps you have to take data with this characteristics. So, node 3 will have size less than equal to 30.5 promotion 0 use these two parameters and get the data and do the same steps as I explained you will get 943 average value and MSE 2384088. Similarly, if I take data whose size is more than 30.5 age more than 17.5 collect those data you will have 32 such observations you can find out the average demand you can find out the MSE it will match. So, therefore, do this as an assignment or as a homework and match these results since this is repeating in nature I am not showing it in the slide, but I request all of the participants to do it and check and verify that average demand and MSE values are matching. So, with this we will finish the lecture. So, the summary of the steps of building regression trees first start with the complete training data in the root node then step 2 split the root node using a predictor variable. So, that it results in maximum reduction in MSE and step 3 repeats step 2 for each internal node using independent variable until the stopping criteria is met. So, what is the stopping criteria? First level of tree from root node level can be depth on depth to depth 3 depth to and so on minimum number of observations in each node or if the largest decrease in MSE would be less than some threshold value. So, these are the steps of building decision tree. So, with this we will stop this lecture I will see you in the next class. Thank you.