Part 6/10:
Epoch AI faces additional challenges on the validation side. Math papers typically contain minor errors that do not undermine overarching theorems, yet for the benchmark, problems require higher scrutiny. The benchmarking team relies on a cadre of competent mathematicians who review problems within their expertise, aiming to shift from passive reviewing to a more engaged process where reviewers must solve the problems to confirm their accuracy.