The Current Testing Dilemma
The challenge facing AI evaluation is multifaceted. Modern Large language Models (LLMs) like Google Gemini and OpenAI's latest offerings are already excelling at conventional tests in fields ranging from intelligence to law. However, this success raises a crucial question: Are these achievements meaningful when the AI systems may have already encountered the test content during their training?
The problem is set to intensify. According to Epoch AI's projections, by 2028, AI systems will have effectively processed all human-written content. This milestone presents a fundamental challenge in continuing to assess AI capabilities accurately.