 Dear participants, welcome to the course on supply chain digitization. It is jointly being taught by Professor Priyanka Burma, Professor Swamita Narayana, and Professor Devav Pratadas from IIM Mumbai. So, this is 10th lecture of module 3 that is analytics in supply chain management module. So, in the last lecture that is lecture 9, we showed it to us like how step by step the python code could be run and you could get the regression tree output using the data demand dot CSB. So, I request all of the participants to please try this out by yourself and do the hands on. So, that you can get the same output using this data demand dot CSB and same regression tree. Now, in this lecture particularly we will see like what is the performance of the model like we have developed the model like in the previous case we went up to depth 2 of regression tree and got the output and the output was a prediction model. So, given the data of retailer, given the data of retailer whether it is located in urban area, semi urban area, rural area which region it is west, east, north, south, how much balance credit it has, what is the age of the retail store, what is the size of the retail store, whether promotion was given or not and how many holidays it has, based on this characteristics I can find out what would be the demand. But if you look into the regression tree, although I had region, balance credit, location, age, size, promotion and holidays as independent variable, but the model used only 3 independent variable like size of the store, promotion and age. The region then balance credit amount, location, holidays were not used. So, I was just wondering what could be the performance of the model. So, how do I do this? So, for that we have to actually find out the performance measurement. The forecasting model like one of the measurement tool is mean square error. So, the formula is you can look it over here. So, I will explain this formula. So, y i is the actual value, y i is the actual value and predicted value is y i hat. So, I have actual value, I have predicted value which I am getting from the regression tree. Similarly, I have actual value and I have predicted value. So, actually this particular retailer did not place an order, but the model is predicting that they will place an order of 943 units and so on. So, therefore, I have y i and I have y i hat, I need to find out the difference y i minus y i hat. So, that is the error term. Now, error square. So, I am doing it error square and then summing it up for all the observations and dividing it by number of observations. So, that is the formula of mean square error. So, now mean means average and square of error term. So, therefore, this is how the formula looks like. So, I have how many n in this case since I am doing it for the test data I have 300 observations. So, 700 observations I have use it for training data. So, I will not use that 700 observation to test the model. So, I am using the test data set and finding out the MSE. So, MSE for the test data set is coming out to be 56,62,511. So, this is the mean squared error one of the performance measurement for forecasting model. Now, in the like python we can use this code since we have not explained this part I am explaining it over here. So, from Escalant there is a library called Escalant in python I am importing matrices. So, then y predict underscore test is equal to regression underscore tree we have the regression tree which we built and then I am predicting for the x test. So, x test is the data which we kept aside. So, I am using that data test data and then I am actually finding out the predicted value as well as actual value. So, if you see this you will get this output. So, if you write this code we will show it towards the end in Google collab also for all 300 test data I have the actual predicted value, actual value and I can also find out the predicted value. So, this portion of the code will tell me what was the actual value of y and what is the predicted value. So, for each detailer I have the actual demand I also have the predicted demand and I can find out the division then the division square that will give me the m s e. Now, from Escalant that import matrix I am importing mean squared error. So, from the Escalant then I am importing this matrix within that matrix there is a mean squared error formula is already written. So, if I call it out and use it for this particular data set I will be able to get the value of m s e. So, 56,62,511 is my m s e value. Now, the question is can I reduce this m s e you can see actually there are some discrepancy between actual data and predicted data. So, if this discrepancy reduces obviously my m s e will also reduce because the difference between y and y i hat reduces then the m s e value also will reduce. So, the question is how can I reduce this difference between y i and y i hat how can I reduce the value of m s e. So, now if you recall in the regression tree we had gone up to depth 2 isn't it? First node 0 to node 1 and node 2 then node 1 is split further node 3 and node 4 node 2 is split further into node 5 and node 6. So, we have gone up to depth 2 now if we go and increase the depth of the tree. So, what is your expectation I will be expecting that my expected my predicted value will be much better that is what I am expecting. So, now what we have done we are we have increased the depth of the regression tree and run a small simulation. So, the output of the simulation is shown over here. So, what we have done we have increased the depth of the regression tree initially it was 2. And if I can see for the test data the m s e value was 56,62,511 for training data it is 59,96,464. So, then we have increased the depth of the tree if we increase the depth of the decision tree you can see in this values also that continuously my m s e value is reducing. So, the same thing I have plotted it over here as depth of the regression tree increases the blue dotted line you can see which is for the training data my m s e value is keep on reducing can you say why and it will keep on reducing if I increase the depth of the regression tree. I can keep on increasing the depth of the regression tree I can keep on splitting the node if I keep on splitting the node again and again and again at the end I will have only one observation in each node. If I have only one observation in each node I know the value of the y for that observation. So, that would be my predicted value and for the training data it will like behave perfectly well and I will get m s e value is 0 if I keep on increasing the depth of the regression tree. So, it is possible that this value will tend to 0 for training data ok, but training data is the data which I am using it and building the model. So, therefore, I have to consider the test data. So, training data I should not use it for checking the performance of the regression tree model in that matter I should not use training data to check the performance of any machine learning model we have to see that test data. Now, if you look into the test data the interesting phenomenon is happening. So, now initially the depth from 2 to 3 I can see there is a reduction in m s e value then 3 to 4 again there is a reduction in m s e value 4 to 5 again there is a reduction, but if I go from 5 to 6 increase in m s e increase in m s e increase in m s e increase in m s e increase in m s e and that is what it is happening. So, initially for the test data my m s e value is reducing and then after that is started increasing ok. So, now if I compare the training data versus test data and their m s e values as we increase the depth of the regression tree starting from 2 depth if I go up to 12th depth the training data m s e value is keep on reducing and it will keep on reduce based on the theory of regression tree, but for the test data after sometime it again started increasing ok. So, this phenomena is called over fitting ok. So therefore, we have to be very very careful that I should not focus on training data and its performance I have to see that test data and its performance. So, in the test data if I see the m s e value it seems that this is the point which is giving me the minimum m s e. So, this point is actually against the value 5 that is if the depth of the regression tree is 5 then on the test data I am seeing the best performance. So, maybe I can use 5 as a depth of the regression tree and build my prediction algorithm. So, I will use instead of depth 2 I will use depth as 5 then run the regression model then whatever output I will get that would be my predicted value and the m s e value suggested that will give me the best prediction as far as regression tree model is concerned. So, 5 seems to be optimum depth of the regression tree ok, but if I see the training data performance the m s e value is keep on reducing keep on reducing. So, this phenomena is called over fitting ok. So, in the training data model is performing very nicely and I should not fall in that trap and keep on increasing the depth of the decision tree I have to see the test data. So, this is called over fitting of the model if you only look into the training data then the model will be over fitted and it will not perform well in the test data. So, now the question is like is there any way to overcome this over fitting issue? Yes, there are ways. So, researchers have developed random forest algorithm which will help me to get rid of this over fitting issue of decision tree. So, decision tree has like 2 like algorithm classification tree as well as regress entry in this class we are focusing on regress entry in one of the previous lectures where we talked about maintenance prediction we used classification tree. So, random forest can be used for both classification tree as well as regress entry. So, this will help me to get rid of the issue of over fitting. So, now let us see what is a random forest algorithm and how it helps me to get rid of over fitting issue. So, this is overall tips of random forest algorithm as the word is forest mentioned over here forest means collection of trees. So, in the machinerly concept forest means collection of trees. So, what are the trees here if you focus on regression then it will be regression tree if I focus on classification it will be classification tree. So, mainly I need more number of trees. So, more number of trees more number of models. So, this particular phenomena is called ensemble modeling. So, instead of building one model instead of building one tree we will build multiple models multiple tree. So, if I have multiple model multiple trees obviously, the model will be robust and I will get a better predicted value which would be robust. So, now let us see how to do this how to build many models. So, random forest means I need more number of trees how can I build more number of trees using the data. So, I have initial data with me. So, let us say this is my training that I have I need to create multiple model out of it. So, I am creating k number of models I am creating k number of models. So, how do I create k number of models to create k models I need k different sample k different set of observations. So, how do I create k different set of observations I can do this using the technique called bootstrap sampling. So, this is also called sample with replacement. So, let us say I have 700 training data with me. So, instead of putting all 700 training data into one model and get only one predicted value we are creating k number of different models. The k value could be find out using optimization technique or should be the optimum number of k. So, that we will talk about later. So, let us say you are using k number of models k could be 20 or 30 it could be 100 assume that 100 decision tree models you are building, but to build that I need 100 such sample data that sample data I can create using bootstrap sampling that is sampling with a replacement. So, I will have k different sample. So, this is my sample 1, this is sample 2, this is sample k. So, I have k different samples that means k different observation. So, in each model observations are different. So now, once we make randomness in the observations then see there are features also. In this model we have 7 independent variables we had age, we had region, we had location, we had size, we had promotion, we had number of holidays and so on. And if you recall we only use 3 variables in the simple regression tree model. The rest of the variables were not used. So, the idea behind random forest is in each model or in each tree by using bootstrap sampling my observations are random. Observations are randomized, observations are randomized I am also randomizing my features. Features are also randomized. So, in each model in each decision tree I am using bootstrap sampling technique to get different observation. So, if I have k number of models let us say this is my model 1 in this case I am using regression tree. Regression tree 1 I will have regression tree 2 I will have regression tree 3 and so on dot dot dot I will have regression tree. So, k. So, I have k regression tree model. Each model my observations are different. Of course, there will be few common observations which will be there in like multiple models. But, if I compare all models same exact same observations will not be there in 2 model. So, therefore, observations are randomized. Since, I have used sampling with replacement technique I will get different observations in first model. I will have different observations in second model. I will have different observations in third model. I will have different observation in kth model. So, observations are different. Then features are also randomized. So, I had 7 features variable. So, features will be randomly selected may be 3 features will be randomly selected in each iteration in each model. So, therefore, features are randomly selected samples are also randomly selected. So, the model which we are getting regression tree 1 in which I will have like samples which are randomly selected. I will have features which are randomly selected and I will get the output. Similarly, regression tree model 2 different observations features are also randomized and so on. So, I will get k different models I will get k different predicted value. So, let us say y 1 hat is the predicted value I am getting y 2 hat I am getting the predicted value for model 2 y k hat I am getting the predicted value of model k. I will take the average of all of this. So, what is the average y 1 hat plus y 2 hat plus dot dot plus y k hat divided by k. So, that is my prediction. So, instead of one predicted value from one regression tree I am getting k predicted value from k different regression tree taking their average. So, that is my predicted value. So, instead of y 1 hat y 2 hat dot dot y k hat I will take the average of k models and I would be expecting that this should be much robust prediction compared to a single model. Single model easy to interpret, but if I have multiple models, multiple prediction and take their average that will be much more robust that will definitely give a better performance in the test data. So, therefore, this is the whole technique about random forest. So, if I just quickly summarize once again. So, what we are doing in a random forest instead of developing one regression tree or one classification tree I am using number of trees forest means number of trees. So, in this case we are developing k trees. So, how can I develop k trees? I need k samples, I need different features. So, for each tree whether it is regression tree or classification tree samples are different because I am using bootstrap sampling method is sampling with replacement samples are different and also features are also random. So, each model features are different samples are different. So, therefore, predicted value which I am getting will also be different and if I take the average of all predicted value I will get the model prediction. So, this is my model prediction. Now, this is for regression tree. So, what will happen? Same thing will happen for classification tree also the steps are exactly same I have to do sampling. So, I will use k data set k different sample samples with replacement. So, samples will be different then I will use the technical feature like randomization of features. So, in each classification tree I will have different observations I will have different features I will get a predicted value. So, let us assume that classification tree 1 is predicting the class C 1 classification tree 2 predicting as class 2 and there are k. So, now I can have only 2 class class 1 or class 2 out of this k which class will appear more I will take that as my best predicted value. So, that is called maximum frequency of classes. So, let us say k equal to 100 and I got C 1 67 time and C 2 33 time. So, my voting will tell me that your prediction is C 1 because that is appearing for more number of time. So, therefore, the idea is same instead of one model develop multiple models and the result which you would get would be more robust compared to a single model. So, now we understood this technique. So, what we did we use random forest regression from S K learn module. So, we have S K learn under that ensemble models are there and one of the ensemble model is random forest regression. So, I am importing that in random forest regression I have to mention number of estimators. The number of estimators means how many trees are there in the forest. So, I am assuming 20 trees are there I can change this value and find out what is the optimum value. So, that is called parameter tuning. So, we are not going into the details of it, but I would suggest all the participants to know more about parameter tuning and find out what should be the optimum number of estimator or what should be the optimum number of trees in the random forest which you can find out using parameter tuning technique. And the maximum depth I am taking 5 because we have seen in the regression tree that in the test data if I keep maximum depth as 5 it is giving me the lowest mean square error. So, using these two parameters we are running the model and we got this as my output. So, in this case I cannot print 20 this is entry. So, therefore, I am not printing the random forest over here, but I will show you the output. So, now again how do I measure the performance of the model? I am using mean square error the same formula which we used for the simple regression tree model we are calculating the MSC values for the test data. So, I have 300 observations which we kept for the for test testing purpose. So, for each 300 observation I have the actual value y i I have predicted value y i hat I can calculate MSC using this formula and if you see the MSC value has been significantly reduced. So, earlier now it is 37 lakh 21603 and earlier it was how much you can see if I use regression tree depth 5 it is around 44 lakh like more than 44 lakh and now it has reduced to around 37 lakh and so on. So, I can see a significant reduction in significant reduction in mean square error if I use random forest algorithm compared to simple regression tree. So, in both the cases I have kept depth as 5, but I have increased the number of trees in the random forest. So, instead of one model we have now developed 20 models and based on this 20 model output I have taken the average and got the predicted value. So, therefore, you can see the MSC value for the test data this is for the test data it has reduced significantly if I use random forest. So, compared to simple regression tree if I use random forest regression tree obviously, I will get a better result that is what it is showing for this particular data set. Now the question is how do you select the best forecasting model in this case we have compared regression tree and random forest and the random forest is showing better result for the test data as far as MSC is concerned. So, there are some steps which you need to follow first you need to understand the data like what is the relationship between dependent variable independent variable whether independent variables are interlinked among each other or not and now if there is if the data is not clean I have to clean the data if the data is outlier we have to do the treatment if the data is missing. So, therefore, we first need to understand the data in detail ok. Then we need to choose the evaluation matrix in this case since we are using forecasting model there are few evaluation matrix are there we have used MSC in our case, but there are other matrix as well. So, what are this matrix I will quickly explain the error term is forecasted term minus actual demand. So, this is my predicted value this is my actual value this is actual demand this is forecasted demand. So, f t minus d t is error then mean square error the error square I am summing summing it up and then divided by number of observation this is MSC then there is a mean absolute division. So, instead of taking square I am taking the absolute value of the error term and then dividing it with the number of observation then we have mean absolute percentage error MAPE I am taking the error term and dividing it by the actual demand and take the absolute value and the multiplying with 100 divided by n which will give me mean absolute percentage error then there is a bias which is calculated like sum of all the error term ok. If this error term is like very high positive this summation is very high positive value. So, what will happen my error is always most of the time error is positive. So, that is not good. So, the average this bias term should be close to 0 if it is close to 0 then sometimes I am over estimating sometimes I am under estimating sometimes value is more than the actual value sometime predicted value is less than the actual value. So, that is good is fluctuating around 0 the error term should fluctuate in and around 0. So, if I get very high positive bias or very negative value of bias both the cases the model is not perfect either it is over estimating or it is under estimating continuously. So, therefore, we have to look into it properly then there is a tracking signal it is bias divided by mean absolute division. So, depending upon your problem nature depending upon your data you have to see which matrix you should choose like in our case we have chosen M A C sometimes you may choose M A P also sometimes you may choose M A D also. So, depending upon your scenario depending upon your context you have to first decide which matrix should I use to evaluate the performance of the forecasting model then once you use that matrix we will use that then I have to split the data training data testing just a validation data we will develop the model on training data then test it using some other data then I should not only focus on like two algorithm I need to experiment with various algorithm in this case we have done regress entry and random forest, but you could go into XG like boosting gradient boosting algorithm. So, if the data pattern is complex if it is dynamic in nature may be boosting algorithm could give you a better result. So, therefore, you should not only rely on one or two forecasting model you should experiment with multiple forecasting algorithm and find out which one is giving you lesser forecast error go ahead with that then there is a important concept called hyper parameter tuning I need to fine tune the parameters like in our case while we were doing the random forest or regression tree I had depth of the tree as one parameter then in the random forest how many trees are there. So, these are called hyper parameter I need to tune it and find out what is the optimum value of this there is a technique called grid search then random search all of this I can use it to find out the best value of the hyper parameter. Then I have to also consider the interpretability of the model if the model becomes so complex and the manager cannot interpret it then there is no use of it. So, therefore, you have to also take care the model which are developing it should be very easy to interpret and the person who is using this model can interpret it easily. Then the last is resource constraint I can develop a very fascinating model, but it needs let us say cloud computing facility large database large server and I do not have that. So, obviously then I cannot implement this model. These are few important like point which you need to keep in mind while you are the selecting the forecasting model. So, with this we will stop this lecture over here see you in the next lecture. Thank you.