 Today, we are going to focus on classification and prediction. I am Dr. R. V. R. Giddhi, professor in computer science and engineering department at W.R.T. Sholapur. The learning outcome of today's video. At the end of this video, students will be able to comprehend the concepts of classification and prediction. The contents of this video are classification, then we will see prediction and classification process. Let us see what is classification. Classification is a form of data analysis that extracts models describing important data classes. It classifies the data into two or more categories based on class labels. For example, it may be in the form of s, no, or may be the category whether they are male or female, or it is positive or negative. Likewise, the things are classified into two categories or into different number of categories. Example, now bank loan approval to customers based on the customer's age. Now as an example, if the customer's age is say more than 50, around say 52, now he has applied for the loan. Now for the loan officer, we will decide on the age whether he should be approved or not approved or he should be approved for this much amount or another amount. Based on the remaining service, he can approve this particular loan. What is prediction? Prediction is the term used when the trained classifier is used to predict the class of data which is unknown to the classifier model. Prediction models are continuous valued functions that is predicts unknown or missing values. For example, fraud detection or medical diagnosis. In medical diagnosis, if a patient is suffering from particular disease say cancer, now what should be the treatment he should be received, either A or either B or C. Based on the symptoms, we can classify those things and we can predict whether he should be received treatment A, B or C. So let us take this question. What is the basic difference between classification and prediction? Take a pause over here, think for a minute and give the answer. The answer is classification is the process of building a classifier model based on known data that is class labels. Second prediction is process to use the classifier model to know the outcome or value of the class label of the unknown data. Means it is clear in classification we know the label of the data or class of the data but in prediction we do not know the label of the data or the class of the data. Let us go to the process of classification. Training is of two steps. The first one is called as a training the classifier or training the model construction and second is testing the classifier or model usage. Now let us see what is training. The model has to be trained by describing a set of predetermined classes in the training data set. Here each record, tuple, sample from the training data set is assumed to belong to a predefined class as defined by the class label attribute. Means whenever we are giving the training data set we know the class of the data. It belongs to either category A or category B. So the set of tuples that is records used for model construction is training set. There are different ways are there. One is classification rule, second is decision tree, third is mathematical formula that is used to represent the model. Then testing is done to predict the class categories of a data records which are unknown to the model. And here the known label of test sample is compared with the classified result from the model. Then accuracy rate is decided after the testing whether after the comparison with the existing data whether it is giving the correct result or not. So how many correct result it has given, the model has given and total number of models we have tested. The test data set is independent of training to judge the model's accuracy. It means that whatever the data set we have given in the training set that is different whereas in testing the data set is different than the training data set. Let us see the figure, testing of classifier model here on the left hand side, on the left hand side over here, training data is given in that training data named the rank of the faculty, the number of years he has worked and what is the tenured either yes or no, means whether it is permanent or not. For example, Mike is assistant professor of having three years experience. Here in a training data set it has said that no. For Mary it has said that yes, for Bill it has said that yes, for Jim it has been said yes and so on. On the right hand side, on the right hand side this training data set is trained and given to the classification algorithm. Now there are different types of classification algorithm, either one of them classification algorithm can be used. For example, the algorithms are decision tree, base, rule based, then back propagation, SPM and association rules. One of the algorithms will be used over here. That output is given to the model and this model classifies this data into different classes. Now the rule is that if rank is equal to professor or the number of years he has worked is say greater than six, then tenured is yes. If suppose it is not there then we have to say no. Based on this rule the classification will be done. Testing, now look at the figure number two, testing of classifier model. Here the training data is provided on the training data model. That gives to classifier model and in the testing period, in the testing data, unseen data means whatever the data is provided in the training that is different over here. So now look at this, Jeff is a professor having experience four. If he is a professor then we have to say tenured yes, if he is assistant professor but he is having greater than six then we have to say yes or if suppose he is assistant professor and having less than six years experience then we have to say no. So this is the testing output of the testing. So the output of over here it has been seen that yes means whatever the data is provided at the time of testing the output is matched over here. So here the accurate result has been obtained. Let us see the comparison of classifier and predictor. Classifier models and predictors can be compared on the following factors. One is accuracy, the speed, the robustness, scalability and interpretability. The references are used over here. I hope you have understood the concept of classification. Thank you.