 A model is a story we tell about our data. It's always different from the data itself. It's a simplified version. A cartoon picture. It's got bold edges and even shading. There are always a lot of stories that can be told using the same data set. When we're building a model, we're making the assumption that our data has two parts. Signal and noise. Signal is the real pattern. It's the repeatable process that we hope to capture and describe. It's the information that we care about. The signal is what lets the model generalize to new situations. The noise is everything else that gets in the way of that. It's the imperfections in our sensors. It's typing things in wrong. It's variations driven by forces that we can't or don't try to model. It's just all the other stuff. It's easy to picture the difference between signal and noise if you imagine listening to your favorite playlist in the middle of winter while there's a heater blasting nearby. The music's the signal. That's the thing that you want to track and absorb. The heater fan is the noise. It's the additional variation piled on top of the signal. And if it gets too loud, it becomes impossible to follow the flow of the signal. It's the goal of models to describe the signal despite the noise. A perfect model describes the signal exactly and ignores all of the noise. If a model fails to capture all of the signal, that type of error is called bias. If a model captures some of the noise, that type of error is called variance. Too much bias in our model means that it will perform poorly in all situations because it hasn't captured the signal well. You may also hear this called underfitting. This was the case when we fit a straight line to our temperature data. It didn't capture the underlying pattern well, and because of that had a much higher error than the rest of our candidates. A linear model fit to this data has high bias. Too much variance in our model will also cause it to fail. It won't generalize well. Instead of capturing just the pattern we care about, it will also capture a lot of the extraneous noise that we don't care about. The patterns in the noise will be different from situation to situation. When we try to generalize and apply our model to a new situation, it'll have extra error. This is also called overfitting. The more complex our model, the greater the risk of overfitting. This was the case in our connect the dots interpolation model. The way to protect against underfitting is to try several different types of models. Each model has its own strengths and weaknesses, things that it's good at, patterns that it finds more easily and more accurately than others, and other patterns that it struggles with. By trying a variety of models, we have the best chance at pulling out the patterns hidden in our data. Our best defense against bias is a rich pool of candidate models. To protect against overfitting, we make sure to test how well each trained model generalizes. After training it on one set of observations, we then use it to predict the pattern in another set of observations. If the predictions are accurate, then we know our model is good and our variance is low. But if it makes much worse predictions on the test data than on the training data set, then we know that the model overfitted the training data, that it captured a lot of noise rather than just the signal. Okay, we are really getting warmed up now. Next in part three we'll consider the question that can make or break our model selection. Choosing the right error function.