Nvidia won the AI training race, but inference is still anyone's game
Inference is a far more diverse workload compared to training - performance is predominantly determined by memory capacity, memory bandwidth, and compute - which of these is prioritized is heavily dependent on a model's architecture, parameter count, hosting location, and target audience.