 In the last video on classification trees, I mentioned the predictive behavior of a model, by which I basically just meant my model's accuracy. So today I'll explore this a little bit more. Now, just like last time, I'll use the iris dataset. I'll feed it into a classification tree to develop a model, and then I can send both my data and this model to predictions, to see how well it performs. Now, the predictions widget displays the predictions and corresponding probabilities of my classification tree, along with the error rate. Now, in the case of trees, these probabilities are based on the class distribution in the leaf nodes. So, I'll go ahead and open the widget, make the first column a little bit wider, and sort the table by prediction error. A classification tree predicts the probabilities of each class, then chooses the most likely one. So, at the top of this table, we see it predicted the iris was a versicolor with a 98% probability. But in reality, the flower was a virginica. Now, there are also a couple more errors of this kind. In the second row, a versicolor was misclassified as a virginica, and in the third row, a virginica was misclassified as a versicolor. But other than that, the classifications were fine. In a dataset of 150 irises, there were three misclassifications. So now, we can work out the overall classification accuracy of this model. We want to divide the number of correct classification with the total number of instances. So 150 minus 3 divided by 150. This works out to be exactly 98%. The same number as the classification accuracy shown at the bottom of the widget. Pretty good. So, now we can go to the flower shop, and we'll only have a 2% chance of guessing our irises wrong. So moving on, let's try plotting our results to get a better overview. First, I want to see the output predictions in the distributions widget. I'll choose to display the predicted class and split the columns by the true class. Now, this graph also shows that the tree mostly made the right choice, and that it correctly predicted every single sitosa. When predicting versicolors, it made two mistakes, and then another mistake when predicting virginicas. Okay, great. These results all indicate that the tree did pretty well. But now it's time to experiment a little bit. So, I'll completely mess up the data just to see what happens. And by mess up the data, I of course mean I'll shuffle the class column. This will emulate a clueless botanist like myself. Because I could measure the petal and sepal leaves, so I'll leave those intact, but I have no clue about the species, so I'll just randomly assign one to each of the irises. In orange, I can do this by feeding my data to the randomize widget. Now remember, the original data was clustered nicely on a petal length petal width scatterplot. But take a look at what happens when I randomize the class. Now to do this, I feed my scatterplot the data from the randomize widget, and I'll turn on replicable shuffling just so that everyone can get the same results. Now I see that the data points are still in the same place, but the colors tell me the classes are all over the place. Okay, so now I can build a classification tree from this data and send it over to the predictions widget. Of course, I expect the tree to perform miserably. Nobody can build a solid prediction model on random data. Now I'll quickly get rid of the scatterplot, as I don't need that anymore, and rewire the signals to use the randomize data instead. Now I can open up distributions again to see what I get. Wow, the tree got a surprising amount of flowers right. How is that even possible? Most of the predicted cytosas really are cytosas, and the same goes for diversicolors and virginicas. It almost seems like magic. Even when the input was completely randomized, it found a way to make mostly accurate predictions. Or did it? Maybe I did something wrong. I probably did. Bad data in, great model out, I don't think so. So I'll have to sort this out in my next video.