RE: LeoThread 2024-10-22 21:22

Model Improvements: Raising the Bar

Anthropic has released two updated models: Claude 3.5 Sonnet (new) and Claude 3.5 Haiku. The new Sonnet version shows across-the-board improvements over its predecessor, with particularly notable gains in coding capabilities - an area where Claude was already considered an industry leader.

Benchmark results paint a compelling picture of the new Claude 3.5 Sonnet's capabilities:

Graduate-level reasoning (GPT-QA): Improved from 59% to 65%
MLU Pro: Advanced from 75% to 78%
Math problem solving: Substantial jump from 71% to 78%
High school math competitions: 16% using zero-shot chain of thought, nearly double the previous version
Agentic coding: Significant increase from 33% to 49%