Part 3/7:
Scaling Model Parameters vs. Optimizing Test Time Compute
The dominant strategy over the past few years has been to simply make the models bigger, by increasing the number of parameters. This method has proven effective, but it comes with its own challenges. On the other hand, optimizing test time compute offers a more strategic alternative. Instead of relying on massive models, we could deploy smaller, more efficient models that use additional computation selectively during inference to improve their outputs.
Key Concepts: Verifier Reward Models and Adaptive Response Updating
The researchers have developed two main mechanisms to scale up compute during the models' usage phase without needing to scale up the model itself: