 Greetings fellow learners! This is going to be the second video in how to build neural networks from scratch and understand its core. But before we get into the topic of backpropagation, I have a thought-provoking question for you. How would you go about learning something new? This could be a life experience or a specific task at work or anything in between. Please comment your answer down below and let's just have a discussion. Now, this video is going to be divided into three passes. We'll start with an overview of backpropagation and as the passes go on, we'll start adding more color and detail. So with that, let's get to it. This here is a feedforward neural network. Now each circle is a neuron that has some inputs and outputs. Each edge connects neurons and has some weight given by a scalar value. Data flows from the left, that is the input layer, to the right, which is the output layer. And this can be used to perform different tasks. For example, it could be used to determine a price of a house given information about it. Or given an image, it could determine if the image is either a dog or not a dog. To do these tasks, a neural network needs to be trained on a lot of data. So if we want to train a network to understand if an image is a dog or not a dog, we would need a lot of images where each image is labeled either dog or not dog. The network sees some tens, hundreds, thousands of these images and labels and then it eventually learns to recognize patterns. This is the training phase. Back propagation is a part of this training phase and we'll take a look at it in more detail in the next pass. And once it looks at all this data, eventually the network can make predictions of its own. Now this part is the inference phase. Squeeze time. Have you been paying attention? Let's quiz you to find out. In a neural network, what is the primary purpose of back propagation? A. Making predictions on new data. B. Learning from labeled data during training. C. Assessing the model's performance. Or D. Applying the trained model to make decisions. Comment your answer down below and let's have a discussion. At this point, if you think I deserve it, please do consider giving this video a like. That'll help me out a lot so thank you so much. That'll do it for quiz one and pass one of this explanation but keep paying attention because I will be back to quiz you. Let's go through the training phase again but add some more details. So during the training phase, the network is fed pairs of an image and whether that image is a dog or not a dog. Now in looking at each pair or batches of pairs, the neural network learns. And by learning, we mean that the neural network actually updates its edge weights. But how is this exactly done? Well, it's done using three main concepts. That's back propagation, optimizers, and a loss function. So let's illustrate how they interact with each other by going through the training phase for one iteration. So the neural network parameters are initialized randomly, so it's effectively dumb. We pass the image to the network. The network generates a number that indicates the probability of it being a dog. This prediction along with the actual ground truth label is passed to a loss function to generate some scalar loss value. The loss quantifies the error of the network. And for this classification problem, we can use something like a cross entropy loss. We want to adjust the weights of the network to minimize the loss. And because of this minimization, we perform some calculus and hence we back propagate this loss in the network to compute the gradients of the loss with respect to parameters of the network. This is known as the back propagation of errors or simply back propagation because the gradients are computed from the last layer to the layer before until the first layer in the backward direction. These gradients are then used by an optimizer to determine exactly how a neural network should be updated. And they could be any of these functions, for example. Once the new network parameters are calculated, the network parameters are then updated. And so the neural network is potentially a tiny bit better at recognizing dogs from an image. And this process is repeated until the network gets better, the loss converges and we are all satisfied. Quiz time! It's that time of video again. Have you been paying attention? Let's quiz you to find out. How does back propagation assist in the training process? A. It performs the update of parameters. B. It determines how erroneous the predictions of the neural network is. C. It calculates the gradients of the loss with respect to the parameters. Or D. None of the above. Comment your answer down below and let's have a discussion. That'll do it for quiz time for now and for past two, but keep paying attention because I will be back. So back propagation is done to compute the gradients of the loss with respect to all the parameters. But let's dive into why we need to do this. So during the training phase the prediction of the network along with the ground truth is passed into the loss function to generate some loss. So the loss depends on the prediction which in turn depends on the network weight parameters. We can plot a relationship on a graph with the x-axis representing the parameters of the neural network and the y-axis representing the loss. But we don't know what this function really looks like. It could look like this or this or this or anything else. Whatever it is we want to adjust the neural network parameters such that the loss is the lowest. But for now let's just say that we have a U-shaped curve here for the loss as a function of neural network parameters. Let's say that when we initialize the network parameters randomly we are somewhere over here. And the goal is to adjust the parameters of the network to get to the bottom of the U where the loss is the lowest. And so we want to change the parameters of the network little by little along the direction of this loss curve until we get to that lowest point. So in general we want to make sure that the loss stays on this curve. To do so we compute the gradient of the loss with respect to the parameters. That's why you know in all of these optimization algorithms you'll see a formula that looks like this. New parameter values are the old parameter values minus a learning rate times the gradient. Now this negative gradient that is negative like delta loss by delta parameters represents the direction we want to jump on the curve. And the negative sign indicates here specifically that we want to descend that curve. The learning rate defines the magnitude that we want to jump. And hence this is the fundamental update rule that you would see in gradient descent which is a fundamental optimization algorithm. If you want to implement this in PyTorch code it's actually just a single line, loss.backward. This will compute the gradients and then if we want to actually learn and update the parameters it's simply optimizer.step. If you want a more detailed code of how this works you can check out my video on building your first neural network right over here. I'll link it down in the description as well. Quiz time! This is going to be an interesting one. Have you been paying attention? Let's quiz you to find out. In the gradient descent update rule what is the significance of the negative gradient of loss with respect to the parameters? A. It provides the direction to update the parameters. B. It provides the magnitude to update the parameters. C. It provides the direction to update the loss. Or D. It provides the magnitude to update the loss. Comment your answer down below and let's have a discussion. And at this point again if you think I do deserve it please consider giving this video a like because that will help me out a lot. That will do for quiz time for this video and also pass three and before we go let's start with a summary. Neural networks have a training phase to learn a specific task. The training phase is three concepts that is a loss function, optimizers and back propagation. The loss function is used to compute the error or the loss of a neural network prediction. This loss is back propagated through the network to compute the gradients with respect to the parameters. And these gradients are then used by the optimizer algorithm to update the parameters of the neural network. And hence the neural network learns. That's all we have for today but if you're interested in building your first neural network with PyTorch and explaining different components I'd recommend you check out this video right over here. Thank you all so much for watching and if you think I deserve it please do give this video a like, subscribe and I will see you in the next one. Bye bye.