Part 7/12:
Despite using outdated chips and a reported training cost of only $5.6 million—a fraction of what industry giants are thought to spend—DeepSeek achieved benchmark results that rival their contemporaries. Their recent models, including the V3 large language model and the R1 reasoning model, have garnered positive feedback for their performance.
Training Efficiency: A New Approach
DeepSeek’s advancement is attributed to strategic adaptations resulting from US export restrictions on high-end chips. Faced with challenges posed by these restrictions, DeepSeek improved efficiency through innovative algorithms and architectural changes. The company used Nvidia H800 GPUs—tailored for the Chinese market—yet these chips lack the transfer speeds of their newer counterparts.