 With our previous video, I began our chapter on classification. Now you may remember last time we took a look at the iris dataset. So, opening it up in orange again, notice the leftmost column is colored gray. We refer to this iris column as the class, or target variable. Now we can also take a further look at the structure of this dataset's features with the select columns widget. Apart from our categorical target variable that tells us what kind of flower we're dealing with, there's also four more features which describe the flower with various measurements. So, let's try to plot this data out. Okay, it looks a little bit messy. The citosas are way off on one side, but the versicolors and virginicas are completely mixed up. Luckily, we can press the Find Informative Projections button. This will help us find a combination of features that separate the three classes as much as possible. And as it turns out, a combination of pedal length and pedal width yield the best results. Now, orange has a bunch of features like this hidden all over the place, and we try to do our best to keep them well documented, but some can slip through the cracks. On the bright side though, if you go looking, you might find some nice easter eggs. Anyway, we were talking about flowers. So remember, my goal from the previous video was to find a model that could predict the species of virus based on their leaf measurements. And the truly amazing part here, as you'll see, is that this model is so simple, I don't even need a computer to memorize it and use it in practice. So, let me dive into this visualization. Now, I could draw a line that separates the citosas from the other two species, right about there. And I can even then write this down. If pedal length is less than 2, then citosa. Else, now, what happens when the pedal length is larger than 2? Well, we can try to draw another line that separates the versicolors from the virginicas about here at a pedal width of 1.7. And let's write this condition down too. If pedal width is greater than 1.7, then virginica. Else, versicolor. Okay, I could keep going further and try to deal with these outliers here. But my point is that with only two rules, I can already build a pretty good classifier. Now, this set of rules I just wrote down can also be represented as a tree. I start with all the data, then I split it into subsets that get more and more accurate at predicting the target. So, the obvious next step is to see what an actual tree would look like in orange. And let's use the tree widget. Then we can visualize the results with the tree viewer. There we go. It looks similar to what I made. I split my data by pedal length first to get all the citosas on the left, and the second split was according to pedal width. Then you can see that this tree actually does keep on going a little bit further, until its leaves are clean enough. Now, if we try clicking on any of the nodes, we'll get the corresponding data instances on the output. So let's try this one here. The viewer is now outputting 54 irises. It looks like most of them are versicolors, but there's also five virginicas. Now it can often be useful to use these types of selections together with a scatter plot and connect them to this selected data set. So, opening the two widgets side by side, I can explore my classification tree. The root node contains all the data instances. Here on the left, we find all the citosas. And on the right is a mixture of virginicas and versicolors. Now, this mix is further split by a pedal width into a group with mostly versicolors, and a group with 45 virginicas and one versicolor. Classification trees are one of the simplest prediction algorithms, but as we've seen, they can already be pretty effective. Now, I skimmed over a couple of topics, and I feel like I still owe you a better explanation of how exactly orange knows when to stop growing the trees, and how it selects features for splitting the data. But time is running out, so you'll have to wait until our next video.