Trade-off with Pre-training: In some scenarios, using more compute for TTT can be more effective than scaling up model size or increasing pre-training compute, especially for easy to medium-difficulty tasks
You are viewing a single comment's thread from: