 In our previous videos, we introduced one of the simplest classifiers, logistic regression. Today, the first step, like last time, is to paint some data like this. Now, for convenience, let me rename the variables x and y to x1 and x2, to clarify their independent variables and represent the inputs of our model. Now, I'll feed this data into the logistic regression widget and then send the developed model to the predictions widget to use on our training set. This process is prone to overfitting, but at the moment, I'm only interested in the decision boundary. So, let's visualize it in our scatterplot by coloring the points according to the predicted class. And let's also display each point's true class with its shape. Now, logistic regression develops a linear model. In two dimensions, it finds a line that best separates the instances of each class. Our decision boundary is somewhere around here, with all the blue points on one side and all the red points on the other. Now, a decision boundary found by logistic regression is represented as z equals w0 plus w1 times x1 plus w2 times x2. Here, z is proportional to the distance from the line, so points with a z of 0 lie directly on the decision boundary. Note that data points with a positive z are above the decision boundary. Logistic regression classifies them to c1, the blue class. Those with a negative z are below the decision boundary and are classified as c2. Now, logistic regression also outputs the class probabilities, when the distance to the decision boundary is large and positive, the likelihood of the first so-called target class c1 is high, and when z is large and negative, the probability of this class is small. So, let's visualize this probability with the size of points. You can see the probability approaches 1 as we move in this direction away and up from the decision boundary, and 0 as we move down in the other direction. To transform the weighted sum z into probabilities, logistic regression uses the logistic function called g. So, the probability of some class equals g of z, or 1 over 1 plus e to the negative z. Now, let's plot this function. When z is large, the output y, that is the probability of c1 approaches 1, and when the value of z is large but negative, the output y approaches 0. Now, we've explained this in our previous videos, so I apologize for the repetitiveness, but I need to remind everyone of the linear decision boundary and how the distance from it transforms into probabilities. Now, let me schematically represent the inner workings of our logistic regression. For our two-dimensional dataset, logistic regression receives a constant of 1, and features x1 and x2. It weights all of these inputs with w0, w1, and w2, and computes their sum, then transforms it with the logistic function g to obtain the probability y on the output. Great. Now, let's add some more data. I'll draw in some extra blue points like this. Obviously, it's impossible to split these two classes with just a single line. Just take a look at the scatter plot. Here, logistic regression fails miserably. This is the decision boundary. You can see many of the points are misclassified. So returning to our painted data, we need a more complex decision boundary. Maybe we could try composing it with two lines, like this. Now the points below these two lines should be red, and the rest should be blue. It looks like we might need two logistic regressions, one for each line, and then another that combines their two outputs. So let's draw this combination of logistic regressions. Here, two logistic regressions are fed the input features x1 and x2 directly, and the third combines their outputs. I'll emit the weight symbol to keep the graphics relevant, but note that each arrow contains one. Now weights are the parameters of our model, and this one contains six, plus another three, so nine parameters that must be inferred from our training data. Three logistic regressions together, or a network of logistic regressions. Now for historical reasons, such a network of logistic regressions is called a neural network. Neural because some researchers in the past were inspired by the way neurons process information. They also receive inputs from the synapses and dendrites, sum them up in the nucleus, and if the signal is sufficiently strong, fire along the axon. A little bit like logistic regression, perhaps. In any case, the artificial neural network we've graphically assembled here contains three logistic regressions and two layers. One layer, called the hidden layer, includes two neurons. It processes the input data and prepares it for the output layer, where we have one more neuron, and where we expect the class labeled data this layer receives to be linearly separable. Now we can construct such a neural network using the neural network widget in orange. We'll instruct it to contain only two neurons in the hidden layer, use the logistic function for activation, and an LBFGS solver to compute the weights from the training data. So let's pass this newly created model to the predictions widget and change the scatter plots color and size to show the results of our neural network. Wow, that works remarkably great. Most of the circles are blue and the crosses are red, just like they should be. Our small neural network with just three logistic regressions can easily cope with this data set. I can, however, easily construct a data set where my network will fail. Let me just paint another set of blue points here. The scatter plot shows that our network with two logistic regressions in the hidden layer cannot cope with our data set. Several circles belonging to C1 on the left are red, while they should be blue. Okay, we would need three logistic regression models to develop a classifier that combines model with linear decision boundaries, something like this. To do this, I only need to instruct orange to build a neural network with three hidden layers. So let me change this parameter in the neural network widget. And here's the result in scatter plot. Wow, we win again. A network with three logistic regressions in the hidden layer and one logistic regression as the output can produce a suitable model for our data set with a relatively non-trivial decision boundary. This, however, is just the peak of the iceberg as far as neural networks go. A few warnings are in order before we continue. First, we've been playing with a two-dimensional data set, and it was easy to devise just the right neural network architecture. Multidimensional data sets may require more complex decision boundaries, and figuring out the correct architecture is also very complex, even more so since there may not be only one hidden layer but multiple. And think about a data set of images, text, or any other unstructured data. While the neural networks for those applications are large, they stem from the same idea of combining simple models, like logistic regression, to fit possibly very complex data sets. We have yet to arrive there in this series, and there is still much to learn about classification, regression, feature ranking, selection, and overfitting. So stay with us.