DeepSeek used 8-bit quantization (FP8) and Group Relative Policy Optimization (GRPO) to train a 671B Mixture-of-Experts (MoE) model for just ~$5.6M, making AI training far more efficient. These techniques cut costs by reducing memory and compute needs while activating only 37B parameters per token, but they don’t eliminate the need for GPUs—just enable larger, cheaper models. OpenAI can now apply the same methods to train a 8–12 trillion parameter model using its massive GPU resources, pushing AI far beyond current limits. Rather than reducing GPU demand, these optimizations will drive even greater demand as AI labs scale models 10x larger than today’s leading systems.
You are viewing a single comment's thread from: