It is not uncommon to purposely try to overfit initially and then work on scaling it back so it generalizes well with unknown data. You want to focus on the best validation score you can get rather than completely eliminating overfitting.
You are viewing a single comment's thread from:
agreed.