 Another clever way to do regularization that doesn't quite feel like regularization is by augmenting the data with synthetic changes to it. We never have enough data, so often one can do some changes to the data that makes it look like new data. And this has been perfected by the people who do image recognition. If you have something like a picture of a cat, of course it's the web, there's lots of pictures of cats, you can put a little image and make sure that the part that you're training on is a little piece of it, you can take that image and rotate it 90 degrees, but usually not flip it upside down because cats usually are not upside down. You can shift it over a couple pixels right, a couple pixels left, a couple pixels down, generating new images. Note that in the original space a picture shifted by three pixels looks quite different for a cat, right? Every whisker has shifted. You've got a new labeled image that's different from the original ones. You can also add a small amount of noise. Add Gaussian noise to the picture, add, take a small fraction, 10% of the pixels and zero them out. These are in many ways similar mathematically, surprisingly perhaps to L2 regularization. All of these allow you to build models that are not quite so overfit to the small number of images, even if it's 100 million images, the small number of images that you have labeled. Give it a try.