RE: LeoThread 2025-02-13 22:13

So, you're missing one of the most important parts of why Deepseek is powerful. It's the reinforcement loop. They discovered and open sourced the o1, o3 test time compute methodology (or at least something comparable), that uses reinforcement learning to train an LLM to generate a chain of thought process that maximizes the reward of getting the right answer (or process if it's a process rather than an outcome reward). This is the thing that can potentially allow an LLM to get above human level performance in the same way that Alphazero could exceed human Go players. One thing that people don't appreciate is that you could theoretically train an LLM with reinforcement learning alone, and no human training data.