 One of the easiest and most widely used methods for regularization is simply early stopping. We'll see later more sophisticated ways to use gradient descent to control regularization, but for now just note that we start all our neural network training with small initial weights that as gradient descent occurs these weights get bigger and bigger and bigger and so if you stop when they're the optimal size, when you minimize overfitting, then they are shrunk the right way compared to how big they could get if you fully converged.