RE: LeoThread 2024-11-17 10:12

“Bit precision matters, and it’s not free,” he said. “You cannot reduce it forever without models suffering. Models have finite capacity, so rather than trying to fit a quadrillion tokens into a small model, in my opinion much more effort will be put into meticulous data curation and filtering, so that only the highest quality data is put into smaller models. I am optimistic that new architectures that deliberately aim to make low precision training stable will be important in the future.”