RE: LeoThread 2025-02-13 22:13

It’s also a bit worse than DeepSeek at coding, scoring 68.9 points vs 71.2, but since the model is open source, all these scores can drastically get better once people start improving upon it.

What set this achievement apart was its efficiency: OpenThinker required only 114,000 training examples to reach these results, while DeepSeek used 800,000.

The OpenThoughts-114k dataset came packed with detailed metadata for each problem: ground truth solutions, test cases for code problems, starter code where needed, and domain-specific information.

Its custom Curator framework validated code solutions against test cases, while an AI judge handled math verification.