 In this video I want to continue our talk on what to do if we have this problem of overfitting high variance So that we have that our model fits the training data very well But when we look at the validation or test set or some real-world data, we notice that that fit is very poor and The method that we're going to talk about in this video is called dropout regularization In short, I'm just going to refer to this dropout just so that we differentiated from the term Regularization by which we usually mean L2 regularization, which we looked at in the preceding video in chapter So what dropout does is this removes some of the nodes now it doesn't remove them They still there but it Creates a zero so the value of that node becomes zero And you can well imagine that if we have a couple of nodes in a layer become zero That is very much the same thing that we achieve with L2 regularization whereby we drive the value of of those Parameters to zero giving us during the multiplication stage Giving us, you know values close to zero for those nodes So how it actually works is well, they're different ways But the the normal is just to use what is called inverted dropout So what inverted dropout does is it looks at a layer and a layer contains a number of nodes We can see they are a layer of nodes and some of them are just going to be chosen to be zero And how does that work? Well, we create a vector of the similar number of elements Yeah, we have one two three four five so we would create a vector of five elements and each of them will Will receive a random value in this domain from zero to one inclusive and we set this value say 0.2 and if it's if that random value is less than 0.2 We turn that value in the vector into a zero and if it's 0.2 or more it'll become a one So we'll have this vector of zeros and ones With that just be random and then we have element wise duplication multiplication Therefore some of the values that remain in that layer after activation remember this happens after activation That is going to be either the actual value or Zero the actual value or a zero so we have these zeros in there Now we actually When this all happens during randomizing those values We actually actually subtract that value that we decide on the 0.2 as our cutoff We subtract that from one and that gives us an 0.8 and that is our what we refer to as our keep probability value We see the keep probability and in essence then you can think what that is what happens We're going to keep 80% of these at random and 20% are going to drop out So our keep probability rate if we said 0.2 would be 0.8 if it was 0.3 It'll be 0.7 and I'm just going to refer to that as a kappa value. They are keep probability values 0.8 Now with the inverted dropout, there's actually one more step now Remember when we do this feed forward when we do the forward propagation we're going to multiply our matrix of Coefficiency of parameters with this column vector x of The previous layer and we add all of those Once we in once the multiplication takes place to give us this value But some of them doing this addition is now going to be zero and imagine we use a rectified linear unit activation for this node It is now going to be smaller than it would have been before because During all of those additions additions from each of these inputs some of them are now going to be zero and We have to we have to compensate for that somehow and the way that we compensate for that It's each of these values that come in Or each of these values that we now have after this multiplication gets divided by The kappa value That's very important We divide all of them by the kappa value so that when we do add before we do our rectified linear unit activation for instance That we are on a similar scale that we would have been should That not have been should we not have had zero It is a very important step and if we write the code That's actually going to happen automatically, but I think it's important just to realize that we don't want we don't we have a scaling problem Where about the node is going to be different the value is going to be different for the activation so we divide these values by by kappa and The effect of all of this is that we might have some of these nodes starting to overwhelm the system they become more important and They become more important during the training phase and then that fits the training data very well But it's a false false hood and we want to try and prevent that so there's this random chance of removing some of these and As you can imagine then once again we constrain the hypothesis space and thereby we create this model a simpler model which might then fit our test of validation or real-world data a Lot better. So I'm very short. That is dropout. Those are just the key Thoughts on dropouts who just understand what is happening We are just going to create some of these values to be zero the technicalities of that You needn't be too concerned about simple addition to our code and very excitingly in the next video I'm going to show you how to implement out L2 regularization as well as dropout We're going to see how that affects the data and how that's going to try and at least try and fix the problem of Overfitting that exists in the data set that we are going to use