 When we go to fit a model to some data, it's easy to overlook the choice of error function. Fitting a model is an exercise in optimization. It's finding the set of parameter values that minimizes a loss function. If you need a refresher on all that, check out the How Optimization Works series. There's a link in the comments below. The difference between a model and a measured data point is called deviation, and the error function expresses how much we care about a deviation of a certain size. Are small errors okay, but large error is really bad, or is being off by a little just as bad as being off by a lot? In business terms, we can think of the error function as how much it costs us in dollars to be wronged by a certain amount. In fact, error functions are also called cost functions. The choice of error function depends entirely on how our model is going to be used. So first, let's think about the case where our temperature predictions are being used to design a greenhouse. The thickness of the glass and the amount of insulation around the base of the greenhouse are carefully selected to create the ideal growing environment. There won't be any heaters or air conditioners to modulate the temperature. Just passive heat flow and determined by the design of the greenhouse. The plants are hardy and can tolerate being off by a few degrees in any direction fairly well. It stunts them a little, but not catastrophically. However, the further the temperature gets from the ideal, the more detrimental it is to the health of the plants. And quickly the effects become more severe. This suggests that the cost function is something like the square of the deviation. Now consider another use case. Now we're designing a greenhouse again, but this time we're including heaters and coolers. This means that we will be able to modulate the temperature to make it suitable for the plants. But the more heating and cooling we do, the more energy we'll have to buy and the more money we'll have to spend on it. So the cost of being off in our prediction is related to how much it costs to correct for it. The energy cost to bring the temperature back to the appropriate range. This suggests an error function where the cost is proportional to the absolute value of the deviation. All of the models fit to our temperature data in part one used an absolute deviation error function. Now let's look at a third use case. Here, our temperature forecasts are now being used to make decisions about when to preheat or pre-cool an office building for a workday. Pre-heating and pre-cooling during the night allows a lower energy price and saves the company money. The cost of any deviations is the additional cost of daytime peak energy. This is proportional to the amount of time the equipment runs during the day, which in turn is directly proportional to the prediction error. However, above a certain prediction threshold, no amount of time heating or cooling will fully make up the difference, so the cost has a ceiling. The equipment just runs all day. This suggests an error function of absolute deviation, but with saturation. Let's look at a fourth use case. Now our temperature predictions are being used in television forecasts. Our viewers don't expect the predictions to be exact, so if they're off by a little bit, there's no penalty. This gives us a don't-care region. There's no cost to being wrong by a small amount. However, if the temperatures are off by much more than that, the viewers become very upset and they're likely to switch to another television station for their weather reports. A quadratic curve gives us the steeply increasing cost associated with this. Now these are four fairly straightforward use cases, but we can handle even much more complex cases. Imagine that our top-notch business analytics team determines that our energy costs have a complicated relationship to prediction errors. Say something that looks like this. That's not a problem either. We can use that just as easily as any of the other candidates we've looked at. The only real constraint on our error function is that it doesn't decrease as it gets further away from zero. It can follow any pattern we want as long as it always increases or at least remains flat. The choice of error function makes a huge difference in which model will fit the best and what the parameter values for that model will be. Each one of these error functions would produce a different set of best-fit parameters in our temperature model. The best-fit curve will be different each time. Starting with the right error function can make a big difference in how useful our model is. The wrong error function can give us a model that's worse than useless. Keep your eyes open for square deviation as an error function. It's a very common choice. So common in fact that inexperienced modelers might assume it's the only choice. It has some really nice properties for mathematical analysis and for that reason it's favored for theoretical and academic work. But other than that, there's nothing that makes it special. Chances are it's not right for your application. Any time that you spend carefully choosing what your error function will be will be richly rewarded. Now that we have a solid foundation, in part 4 let's take a close look at splitting the data into training and testing data sets. It's harder than it looks.