 Y can lasso be zero, but ridge regression is just near zero. Consider the mean squared error loss function, where we only have one example, the input of x1 is just one, and there's no intercept term. We can then take the derivative of the loss function with respect to theta1 and equate it to zero. Doing this, we'll get a relationship between theta1 and y and lambda. The x-axis is the response y, and the y-axis is the coefficient theta1. The red line is ridge regression. The piecewise linear line here of three components is lasso. With lambda at zero, there is no regularization. As we increase lambda, for ridge regression, the values of theta1 will tend to zero, whereas for lasso, the value of theta1 will just be zero at different values of yi.