Part 6/8:
Benchmarks of Deep Research
The technical foundation of Deep Research relies on advanced models trained via end-to-end reinforcement learning across diverse domains. The tool is capable of executing complex reasoning tasks, discovering relevant real-time information, and generating insightful outputs—all critical capabilities that allow it to perform well on established benchmarks.
In assessments like Humanity’s Last Exam, a test of AI across expert-level questions, Deep Research scored remarkably, achieving a 26.6% correct rate—more than double that of its predecessors. This impressive performance underscores the power of combining raw intelligence with real-time information retrieval.