Let’s consider we have a dataset residing on an S-shaped curve such as a logarithmic curve. Fitting a high-order parabola passing through the known points with zero error is always possible. On the other hand, we can fit a straight line with a high error rate.
The first solution generates an overly complex model and models the implicit noise as well as the dataset. As a result, we can expect a high error for a new data point on the original S-shaped curve.
Conversely, the second model is far too simple to capture the relationship between the input and output. Hence, it will perform poorly on new data, too: