 Today, we are going to discuss classification algorithms which are rule-based. A statement which is written in the if-then form is called as a rule. At the end of this session, students will be able to demonstrate various classification algorithms based on rules and compare between them. The rule-based classifiers record by using the collection of the if-then rules as I have told you. The rule may have a condition on the left-hand side and an output generated on the right-hand side which is a class label. The left-hand side is called as antecedent and the right-hand side is called as consequent. We have a rule-based classifier example which is given here and we have a rule which is generated from this data set as given birth being no and can it fly being yes then it is called as a bird. So the arrow says then of the statement. So application of rule-based classifiers are used for classifying our data into various forms. Rule coverage and accuracy are two terms which are used when we are constricting rules. So coverage is the fraction of the record that satisfies antecedent whereas accuracy is the fraction that satisfies both antecedent and consequent of a rule. Let us pause and think for a while how rules are going to help us to do a classification. To do this work from our data set we generate various rules and we try to trigger these particular rules so that it classifies some object of our interest. So using rule R3 you are going to see that it classifies as a mammal. The turtle triggers both the rule R4 and R5 having these particular attributes in them and the dogfish spark triggers none of the rules above. So the characteristics of rule-based classifiers are they are mutually exclusive rules which are generated or they are exhaustive rules which are generated. So the classifier contains every record is covered by most of the one rule or each record is covered by at least one rule. From a decision tree we could generate rules and these rules are mutually exclusive and exhaustive. They contain much information as the tree. Rules can be simplified as initial rules or simplified rules and then you can go for getting more complex rules. Rules are no longer mutually exclusive because you could have an ordered rule set or an unordered rule set using voting schemes. Or rules are no longer exhaustive because you can use always a default classes. Ordered rule set uses rules which are ranked and given a particular order according to its priority when a test record is presented to a classifier. So this is done as shown here through examples. Rule ordering schemes are individual rules are ranked based on their quality or rules that belong to the same class appear together. So there may be rule-based ordering or based upon the class, class-based ordering. You may have a direct method which is used in Ripper, CN2, all these tools. Holt is 1R or an indirect method which is used in C4.5 rules where we extract rules from other classifying modes. Then we may use a sequential covering which starts from an empty rule and we try to grow a particular rule till we get the largest rule which represents the data set and a stopping criteria is developed. Example of a sequential is as shown here. Those who are like rules and sequentially based rules are all then taken. That cluster is converted into a rule form. Then we try to find out other clusters of the same form, the positives. So we have a rule growing, an instant elimination, a rule evaluation, a stopping criteria and a rule pruning which are done in sequential covering. So rule growing is grow into the depth of your particular tree to grow a particular rule to be from more specific to general or general to specific. Follow a particular strategy and you may use a CN2 algorithm or a Ripper algorithm as the case may be. There might be instance elimination where you eliminate certain instances if you do not want an overlap between the particular rules. So you prevent undeterminating accuracy of the rule or you compare between the two rules in the particular diagram which is better. You go for that. You calculate evaluation matrices using these particular formulas where n is the number of instances covered by a rule. nc is the number of instances covered by the rule. k is the number of classes and p is the prior probability. And hence accuracy, laplace and m estimates of the particular rules which are matrices can be evaluated. Stopping criteria and rule pruning associates where we should stop to grow our particular rule. So you may use a stopping criteria of computing the gain. If the gain is not significant discard that particular rule. For the rule pruning, similar to the post pruning of the decision tree, reduce the error pruning. Remove the one of the conjuncts of the rule. Compare the error rate on the validation set before and after pruning. If the error improves, prune the conjunct. A summary for these direct methods is grow a single rule, remove instances, prune the rule if necessary, add rules to the current rule set. Repeat this procedure till you get the best rule that represents the data set. In the repair for a two class problem, we choose one of the classes as positive and the other as negative class. We learn rules for the positive classes. Negative class will be now the default classes and we apply this for also our multiple class problems. Where we may not have only two classes but we may have many classes which are ordered. Then you learn from these particular classes and you repeat the next smallest class as the positive class. For repair, again you have a growing of a rule. We start from an empty rule. Apply conjuncts of the particular rule. Stop when you are not getting any negative examples there. Prune the rule. Find the measure of the pruning. Apply it to a particular method and stop where you have properly designed your particular rules. Build your rule set. Use a sequential covering algorithm which finds the best rules that covers the current set of positive examples. Each time a rule is added to the rule set, compute a new description for its length. Optimize the particular rule set using the rules that minimize the MDL principle. Repeat the rule generation and the rule optimization for the remaining positive examples. There are indirect methods from going from a tree to the rule set. Generating those particular rule sets. These are used in C4.5 rules. Extract the rules. Once we are extracted, compare it. Prune it. Repeat this procedure till you get the best rule. Relate the generalization error. Instead of ordering the rules, now order the subsets of the particular rules which is also termed as class ordering. An example is for this particular data set, we have generated a tree. You could get the particular rules for this particular environment using C4.5 and using Reaper. Then you could apply this. Get general better rules from this particular format. And the advantage of rule-based classifiers is it is highly expensive as decision trees. It's easy to interpret, easy to generate, can classify new instances rapidly. It can perform comparable to the decision trees. For our references we have used. Thank you.