How do we think about speed and price vs performance for models? One could imagine extremely slow incredibly performant models may be quite valuable if compared to normal human speed to do things. The latest largest Gemini models seem to be heading in this direction with large 1 million+ token context windows a la Magic, which announced a 5 million token window in June 2023. Large context windows and depth of understanding can really change how we think about AI uses and engineering. On the other side of the spectrum, Mistral has shown the value of small, fast and cheap to inference performant models. The 2x2 below suggests a potential segmentation of where models will matter most.
You are viewing a single comment's thread from: