You are viewing a single comment's thread from:

RE: LeoThread 2024-10-11 15:52

in LeoFinance3 months ago
  1. Success in Simple Tasks: O1 Preview achieved perfect scores in some simpler tasks, such as the "Blocks World" challenge, significantly outperforming both GPT-4 and O1 Mini.

  2. Struggles with Complexity: Despite improvements, all models, including O1 Preview, struggled with more complex spatial reasoning tasks. The "Floor Tile" and "Termes" challenges, which involve multi-dimensional planning, proved particularly difficult.