 Why do we need data validation in machine learning? So let's say that we have some data set and we use all of this data to train a machine learning model. And then we'll see that we get a mean squared error of 8.3. But when actually using the model in the real world, we see that we get very poor performance for some reason. And this is where data validation's importance comes to play. Data validation allows us to understand the true error of a model when working in the real world. So for example, if we allocate some of this data for training and only test with the remaining data, which is unseen, we'll get a much better intuition on how this model will truly perform in the real world.