 Greetings fellow learners! In this video we are going to talk about activation functions in neural networks, but before embarking on this journey, I have a thought-provoking question for you. When studying or learning something new, what habits help you concentrate? Is it listening to some fun lo-fi music, sipping on some hot cup of coffee, or is it something else? Comment down what you think below and let's have a discussion. This video is going to be divided into three passes, talking about the high levels of activation functions, the wise, the what's, the how's, and then dive into some code as well. So let's get to it. This is a feed-forward neural network. Each circle is a neuron that has some inputs and outputs, and each edge connects neurons and has some weight given by a scalar value. Data flows from the left that is the input layer to the right, which is the output layer, and this can be used to perform different tasks. So for example, it can take some information about a house, like its square footage, number of bedrooms, and so forth, and it can produce the output price. Now this problem might be a little easy for the neural network, so it can do pretty well, but let's say that we change the problem and we want to train the neural network to take in an image and it will tell you whether that image is a dog or not. Now in this case, when the neural network is trained, it probably doesn't perform as well now, and the reason could simply be that the relationship between the input and output here is more complex, and one way a neural network can pick up on this complex interaction between inputs and outputs is by adding an activation function to each neuron. Once this new neural network with all these activation functions is trained on image classification data, the network can now better pick up on patterns to produce better results. Quiz time. Have you been paying attention? Let's quiz you to find out what is the purpose of adding an activation function to neurons in a neural network. A. Initialization of neural network weights. B. Enhancement of data flow between layers. C. It allows the neural network to capture complex patterns. Or D. It allows reduction of computational complexity. Comment your answer down below and let's have a discussion. And at this point, if you do think I deserve it, please do consider giving this video a like because it will help me out a lot. Now that'll do it for quiz one and also pass one of the explanation, but keep paying attention because I'll be back to quiz you. First, what is an activation function? Neural networks have an input and output. So neural networks are functions. These inputs and outputs are easy for us humans to understand. And each neuron also has its own inputs and outputs to. So these two are also functions. Here is the diagram of an individual neuron with inputs ABC from the previous layer, and its edge weights are W1, W2, W3. And these are the parameters of the network. Internally, these nodes are taking in a sum of products and linearly combining the inputs A, B and C. It's a simple function. Now simple neurons like this can capture simple patterns in data. But after taking the sum of products, we can add additional information to add some nonlinearity to that neuron. And this additional function is the activation function. And the activation function could be any one of these functions. So adding some nonlinearity to each neuron, the neural network as a whole can understand complex patterns. But to understand how exactly this is possible, we need a little bit of math. So let's say that we have a neural network with three layers, the input layer hidden layer with a bias neuron and the output layer. Now each of these layers will only have one neuron just for simplicity sake. So let's explore the first case where there is no activation function in the neurons. And without an activation function with a little bit of math, we can show that the output is a linear function of the input. What this means is for regression problems, it can really only plot linear regression, like a line. For classification problems, there is a little extra step where we add a softmax or a sigmoid to the output neuron. But even here, we can only draw simple linear decision boundaries. Now let's take a look at the second case where we have a simple neural network, but each hidden neuron and now has an activation function f. With an activation function, we can do some math again to show that the output now is not necessarily a linear function of the input. What this means is for regression problems, the model can plot nonlinear curves to fit the data. And for classification problems, as before, there's a little extra step where we add a softmax or a sigmoid to the output neuron. But even here, now the model has the potential to draw nonlinear decision boundaries. Now for some more information on how this fun equation creates the decision boundary. And if you're also not afraid of math, you can check out this video. It's a supplementary material. Quiz time. It's that time of video again. Have you been paying attention? Let's quiz you to find out. For a neural network of five hidden layers, which of the following statements is true? A, if no neuron has activation functions, the network can recognize nonlinear patterns. B, if no neuron has activation functions, this network can only recognize linear patterns. C, if all neurons have activation functions, this network can recognize nonlinear patterns. Or D, if all neurons have activation functions, this network can only recognize linear patterns. Now note here that multiple choices can be correct so do comment your answer down below on what you think and let's have a discussion. Now that'll do it for pass two and for quiz two as well but do keep paying attention because I'll still be back to quiz you. Now for pass three I wanted to walk through a Google Collaboratory notebook that shows how we can train a model without activation functions, train a model with activation functions and then try to compare them to see how activation functions may or may not improve the model. So we first start by importing some packages and libraries here. Input dim is a variable that it describes how many features we want our data set to have. We're going to synthesize this data set with a function called make classification. So essentially we have 5000 examples each of them is going to have 100 features and the y dot shape disturbs the labels itself. We then split the data into test and train with a 80-20 split. We then convert all of them to PyTorch tensors so that they can be processed. We'll now create the neural network itself. So this neural network is going to be a very simple network where we have the first layer is the input layer which has 100 neurons. This then has a hidden layer with 16 neurons and the output layer is just going to have one neuron. We'll define a variable called use activation to pass in as a parameter to suggest whether we should use an activation function for the hidden layer or not use an activation function. And accordingly we define a forward function. This function is going to override the super class that is the torch modules forward function and is executed on the forward pass. So we pass it an input. The input may or may not use an activation function depending on whatever use activation is. And then because it's a classification problem we are going to wrap it around a sigmoid. Now we're going to define a function to train and evaluate the model. So first we initialize a loss which is the binary cross entropy loss. We then make use of an optimizer. Optimizers are algorithms that actually perform the model updates. In this case we're using the atom optimizer and then now we have the training loop over here. While we're iterating over every epoch we first get the model predictions. We'll then compute the loss passing in the ground truth as well as the model predictions. And for every five instances over here we're going to print the loss. The loss backward step will compute the gradients of the loss with respect to the parameters of the network. An optimizer step will actually perform the parameter updates using the atom optimizer algorithm or update rule. Once the model is trained we are then going to make predictions and then just determine a very simple accuracy over here. Now we initialize a neural network without any activation function and we will determine its accuracy. And in the second case we will create the model with activation functions and then determine its accuracy. And when printing out the accuracy values you'll see that with activation functions we can perform slightly better than without. But you can also note that if you try to execute this for different neural networks and different types of data you might end up with situations where using an activation function can potentially lead to worse results and this could potentially be due to like overfitting of data in which case you might want to add some regularization techniques like dropout or simplify the network itself. Quiz time! Okay this is going to be a fun one. In practice adding activation functions can decrease performance for the following reasons. A. The relationship between input and output is complex. B. The relationship between input and output is simple. C. The model is overfitting the data. Or D. The model is underfitting the data. Note here just like in quiz two multiple answers can be correct so please do comment your answers down below and let's have a discussion. At this point again if you do think I deserve it please do consider giving this video a like because it will help me out immensely. And that'll do it for quiz time for this video and also pass three of the explanation but before we go let's get to a summary. So activation function is a function that allows neural networks to pick up on complex patterns between inputs and outputs. Neural networks without such activation functions are relegated to linear regression and regression problems or linear decision boundaries in classification problems. In practice overfitting with activation functions can be combated using dropout or a similar model architecture. And that's all we have for today. The code and the resources that we're using this video are going to be available in the description down below and if you want to understand more mathematical details on why activation functions work so well I suggest you click this video for some additional content. Thank you all so much for watching and if you do think I deserve it please do give this video a like subscribe for more amazing videos and I will see you in the next one. Bye bye.