 This week we're talking about regularization and one of the key themes is going to be that regularization is shrinkage. Or the other way around, shrinkage is regularization. The smaller the weights in the model, the more you regularize. We'll see lots of ways to shrink the parameters, penalties like L2 regression, zeroing out some of them, starting with a tiny set of weights, randomly around zero, and gradually making the weights bigger and bigger by gradient descent as we learn. All of these methods lead to risk smaller weights, and smaller weights will regularize. But what does that mean in practical terms? It means they give smoother models. If you have large weights, then the output as a function of input, y as a function of x, is more weakly, more bouncy. It means that if you have smaller weights, they will have lower capacity. They'll be able to hold less data in the neural net as the weights get bigger, the network's capacity, their ability to fit signal raises, and their ability to fit noise raises. So smaller weights shrink the model, they mean that you can fit the noise less well. Now this could be good or bad, right? Shrink too much, you underfit, shrink too little, you overfit. Cool. So let's look at a simple demonstration of this. I've taken four data points and fit a third order polynomial. If you do no regularization, the blue line, you can see we can fit the four points perfectly, but maybe we should do some shrinkage. So let's fit that model again using a ridge penalty. If you use a ridge penalty of a moderate amount, point five, you get the orange line, nice and smooth. Looks like a decent approximation to a cubic. If you use a huge ridge penalty, say five thousand, you get the green line, nice and smooth, but very flat. The orange line, no regularization, no shrinkage, bigger parameters, high variance, each set of four points you fit gives a very different model, but low bias, you fit them perfectly. If you have big shrinkage, you make the weights much smaller and the weights smaller give you a green line. In that case, you have high bias. You're tending to always be underestimating the true values of Y, but low variance, every single training set that you throw this onto is going to give you a line that looks pretty close. They're all pretty low predictions. So as you go through the week, watch to see how each of the different regularization methods is shrinking the parameters.