Contrary to popular belief, AI model inferencing — running a model, like when ChatGPT answers a question — is often more expensive in aggregate than model training. Consider, for example, that Google spent an estimated $191 million to train one of its flagship Gemini models — certainly a princely sum. But if the company were to use a model to generate just 50-word answers to half of all Google Search queries, it’d spend roughly $6 billion a year.
Major AI labs have embraced training models on massive datasets under the assumption that “scaling up” — increasing the amount of data and compute used in training — will lead to increasingly more capable AI.