 We already know how to pre-process our corpus and how to find similar documents with hierarchical clustering. But Grimm's tails also have a label. The tails are either animal tails or tails of magic. Can we use this data to predict the type of a new, unclassified tail? We read Grimm's tails selected and check the data in CorpusViewer. Tail labels are provided in ATU topic field. We prepare the text with pre-processing and turn it into a bag of words, which represents each tail with a vector of word counts. Now for the classification. Connect logistic regression to bag of words. Logistic regression constructs a model to predict whether a tail is an animal tail or a tail of magic. We can even see how our model looks like. We will use Nomogram, which visualizes the logistic regression classifier. Connect it to logistic regression. The widget displays top 10 words that are important for the classifier. At the top of the list are the words that most contribute to the prediction. Seems like the word Fox can tell us a lot about a tail. If Fox appears often in the text, it's an animal tail. If it doesn't, it's probably a tail of magic. Now we know how our classifier works and it's time to see if it also performs well. Connect test and score to bag of words. We will use test and score to cross validate the logistic regression model. Not bad, the area under RRC curve is over 0.9. When given two tails of a different class, logistic regression can correctly distinguish between them in over 90% of the cases. But we said we want to predict the tail type, right? And we don't want to predict something we already know. We will place a new corpus widget on the canvas. Let us load three new tails from Hans Christian Andersen. We will ask our logistic regression model to tell us what are the types of these new tails. Connect corpus 1 to predictions. Now provide the green-trained logistic regression model and observe the results in predictions. Our model says the ugly duckling is an animal tail. And a little match seller, a tail of magic. Seems quite right. Predicted class probabilities were high as well. The probability that the ugly duckling is an animal tail is 90%. Today we've learned how to inspect our logistic regression model with nomogram, how to reuse the familiar classification workflow on text and how to predict the type of the tail on new corpus. Working with text in orange is just as simple as working with spreadsheets.