Model Improvements: Raising the Bar
Anthropic has released two updated models: Claude 3.5 Sonnet (new) and Claude 3.5 Haiku. The new Sonnet version shows across-the-board improvements over its predecessor, with particularly notable gains in coding capabilities - an area where Claude was already considered an industry leader.
Benchmark results paint a compelling picture of the new Claude 3.5 Sonnet's capabilities:
- Graduate-level reasoning (GPT-QA): Improved from 59% to 65%
- MLU Pro: Advanced from 75% to 78%
- Math problem solving: Substantial jump from 71% to 78%
- High school math competitions: 16% using zero-shot chain of thought, nearly double the previous version
- Agentic coding: Significant increase from 33% to 49%