Part 1/3:

The Curious Case of Google's Gemini Model

The Hype and the Reality

Google's new Gemini experimental model has been making waves in the AI community, ranking number one on a blind voting human preference leaderboard. However, a closer look reveals that this ranking may not tell the whole story.

Digging Deeper

When factors like length and style of response are controlled for, Gemini drops to fourth place, behind the newly updated Claude 3.5 Sonet and the 01 preview model from OpenAI. Additionally, when it comes to mathematical questions or "hard prompts," 01 preview takes the lead.

The Missing Benchmarks

One striking aspect of the Gemini release is the lack of benchmarks and promotional materials typically associated with a new model. Instead, we've been left with tweets and an API that's not yet fully functional. This raises questions about the true capabilities of the model and Google's confidence in its performance.

Emotional Intelligence Shortcomi

[...]