RE: LeoThread 2024-10-22 21:22

While these improvements are impressive, it's worth noting that in some areas, competitors maintain their edge. For instance, Google's Gemini 1.5 Pro still leads in math problem solving with 86.1%. Notably absent from the benchmarking comparisons was Anthropic's O1 model, likely due to its different operational approach involving longer "thinking" time.

The Claude 3.5 Haiku update is also significant, as this smaller model now outperforms the previous Claude 3 Opus, demonstrating Anthropic's ability to achieve better results with more efficient models.