RE: LeoThread 2025-02-01 10:54

On the one hand, these results don’t change anything for most people using these tools. “The general public doesn’t care whether it’s doing reasoning or not,” Dziri said. But for the people who build these models and try to understand their capabilities, it matters. “We have to really understand what’s going on under the hood,” she said. “If we crack how they perform a task and how they reason, we can probably fix them. But if we don’t know, that’s where it’s really hard to do anything.”