 Today we are going to discuss the classification algorithms that are statistical based. At the end of the session, a student will be able to demonstrate various classification algorithms which are now based on statistics and compare between these algorithms. We have a categorization of algorithms which are data mining algorithms based on statistics called as statistical algorithms, the second based on the similarity or a distance measure called as distance based algorithms and the third based on the use of if then rules which are called as rule based algorithms. Rule based algorithms are those algorithms which are based on some statistical technique. The first algorithms are those of regression. This deals with the estimation of an output value based on certain input values. The input values are from the data set D because we have to use this particular knowledge which is stored in the particular data set to do a classification finally and the output values are the classes which are assigned. So they represent the classes which have to be assigned for the different items in the data set. Regression can be used for forecasting and regression takes a set of data and fits that data to a formula. In this formula there will be inputs, there will be outputs and there will be some coefficients. A simple regression can be thought of estimating the formula for a straight line. Regression can be equated to partition the data into two classes. The straight line is the breakeven point or the division between the two classes, one class above the line and one class below the line. When we can plot such a line then we say that the regression is of the linear type and called as linear regression which is given a formula y is equal to c0 plus c1 x1 plus dot dot dot up to cn xn where we determine the regression coefficients c0 c1 up to cn and find the relationship between the output parameter y and the input parameters x1 x2 up to xn. If we looked at that particular formula which we have designed it is similar to the formula which we have studied right from school that is the equation of the first of your straight line y is equal to mx plus b where given two points in the xy plane m and b are the regression coefficients. The two points represent the training data and actual data points do not fit the linear model exactly when we are using actual data we may not get a linear model exactly. So the model generated is an estimate of the actual input-output relationship and the general linear model can be used to predict an output value given an input value. If we attempt to fit the non-linear data with this model we will result in a poor model of data because we are trying to fit non-linear data into a linear model. So a poor model is developed. So what we have to do is the data does not fit in this linear model is the drawback. So the linear model is poor because of the noise or the outliers which are existing in our particular system however much we try to clean them in the cleaning phase of data mining still there exists some noise and outliers. Now the noise is erroneous data, outliers are exceptions to usual or expected data and now the observable is now described by y is equal to c0 plus c1 x1 up to cn xn plus that error term which is the random error with the mean 0. So the accuracy to fit of linear regression model to actual data can be estimated using a mean squared error function. There are different approaches for this regression the first one is your division approach where the data is divided into regions based on the class. Data is plotted in n dimensional space without explicit class values shown. Regression divides the space into regions which is one per class. So here we are dividing into regions and each region representing items for a particular class. The second approach is that of prediction where we find a formula and we generate this formula to predict the output class value. A value for each class is included in the graph and regression generates formula for a line to predict the class values. I already have told you that for actual real world data plotting the straight line is of difficulty and therefore we go for logistic regression which may be used to fit data in logistic curves where linear regression does not work. The second method is the statistical based method on the Bayesian approach. Here we assume the contribution by all attributes are independent. Which contributes equally to the classification problem. The name based classification is based on the Bayesian rule of conditional probability where we may use measures of belief and measures of disbelief for a particular data. The strategy that is used here is given a data value xi the probability that a related tuple ti is in a class cj is described by probability of cj given an xi and training data used to determine probability of xi where probability of xi given cj and probability of cj. So estimate the posterior probability probability of cj given xi and then probability of cj given tuple ti. The method that is used here is given a training set first estimate the prior probability probability of cj for each class by counting how often each class occurs in the training set. Then for each attribute xi the number of occurrences for each attribute value is counted to determine probability of xi. Then we estimate probability probability of xi given cj by counting how often each value occurs in the training data. Here we look at the attribute values the tuples in training data have many attributes with many values each derive probability when new tuples must be classified. Thus Naive Bayes approach is descriptive and predictive both here we have some special remarks to be given of Naive Bayes. The probabilities are descriptive and used to predict class membership for a target tuple when classifying target tuple conditional and prior probabilities from the training set are used to make prediction. It combines the effect of different attribute values from the tuple. What is the advantage of Naive Bayes it is easier to use only one scan of the training data is required from the start to the end and we easily can handle missing values by omitting probability when calculating likelihoods of the membership because the probability of occurrence is very less and for simple relationships it yields a good result. What is the disadvantage Naive Bayes does not allow yield satisfactory results in many situations because it is probability based attributes are huge not usually independent but we want them to be independent so that the classes will be well defined. We sometimes use subset of attributes ignoring those that are dependent on others it does not handle continuous data and even after dividing into ranges it is difficult since the division of domain into these ranges is difficult and this impacts on the results that are developed by this particular algorithm. The differences which we have used are thank you.