 What is regularization trying to do with linear regression? This is the residual sum of squares. Y i is the actual label. This yellow term is the prediction. If the learned parameter theta i is way too high, then even small changes in the input x will lead to large changes in the prediction. This is overfitting. We can curb this with the penalty term in the form of either ridge or lasso. Theta 1 and Theta 2 are some learned parameters of a model. And let's say that the values that lead to the lowest mean squared error is some very high value of Theta 1 and Theta 2. This is indicative of overfitting. But with ridge regression and lasso, we want the values of Theta 1 and Theta 2 to be confined into a very small region. For ridge, it'll be the circle, hence the equation. And for lasso, it'll be this diamond, hence this equation.