First impressions of ChatGPT o1: An AI designed to overthink it
OpenAI released its new o1 models on Thursday, giving ChatGPT users their first chance to try AI models that pause to "think" before they answer.
OpenAI released its new o1 models on Thursday, giving ChatGPT users their first chance to try AI models that pause to “think” before they answer. There’s been a lot of hype building up to these models, codenamed “Strawberry” inside OpenAI. But does Strawberry live up to the hype?
Sort of.
Compared to GPT-4o, the o1 models feel like one step forward and two steps back. ChatGPT o1 excels at reasoning and answering complex questions, but the model is roughly four times more expensive to use than GPT-4o. OpenAI’s latest model lacks the tools, multimodal capabilities, and speed that made GPT-4o so impressive. In fact, OpenAI even admits that “GPT-4o is still the best option for most prompts” on its help page, and notes elsewhere that GPT o1 struggles at simpler tasks.
“It’s impressive, but I think the improvement is not very significant,” said Ravid Shwartz Ziv, an NYU professor who studies AI models. “It’s better at certain problems, but you don’t have this across-the-board improvement.”
For all of these reasons, it’s important to use GPT o1 only for the questions it’s truly designed to help with: big ones. To be clear, most people are not using generative AI to answer these kinds of questions today, largely because today’s AI models are not very good at it. However, o1 is a tentative step in that direction.
Thinking through big ideas
ChatGPT o1 is unique because it “thinks” before answering, breaking down big problems into small steps and attempting to identify when it gets one of those steps right or wrong. This “multi-step reasoning” isn’t entirely new (researchers have proposed it for years, and You.com uses it for complex queries), but it hasn’t been practical until recently.
“There’s a lot of excitement in the AI community,” said Workera CEO and Stanford professor Kian Katanforoosh, who teaches classes on machine learning, in an interview. “If you can train a reinforcement learning algorithm paired with some of the language model techniques that OpenAI has, you can technically create step-by-step thinking and allow the AI model to walk backwards from big ideas you’re trying to work through.”
ChatGPT o1 is also uniquely pricey. In most models, you pay for input tokens and output tokens. However, ChatGPT o1 adds a hidden process (the small steps the model breaks big problems into), which adds a large amount of compute you never fully see. OpenAI is hiding some details of this process to maintain its competitive advantage. That said, you still get charged for these in the form of “reasoning tokens.” This further emphasizes why you need to be careful about using ChatGPT o1, so you don’t get charged a ton of tokens for asking where the capital of Nevada is.
The idea of an AI model that helps you “walk backwards from big ideas” is powerful, though. In practice, the model is pretty good at that.
In one example, I asked ChatGPT o1 preview to help my family plan Thanksgiving, a task that could benefit from a little unbiased logic and reasoning. Specifically, I wanted help figuring out if two ovens would be sufficient to cook a Thanksgiving dinner for 11 people and wanted to talk through whether we should consider renting an Airbnb to get access to a third oven.
After 12 seconds of “thinking,” ChatGPT wrote me out a 750+ word response ultimately telling me that two ovens should be sufficient with some careful strategizing, and will allow my family to save on costs and spend more time together. But it broke down its thinking for me at each step of the way and explained how it considered all of these external factors, including costs, family time, and oven management.
ChatGPT o1 told me how to prioritize oven space at the house that is hosting the event, which was smart. Oddly, it suggested I consider renting a portable oven for the day. That said, the model performed much better than GPT-4o, which required multiple follow-up questions about what exact dishes I was bringing, and then gave me bare-bones advice I found less useful.
Asking about Thanksgiving dinner may seem silly, but you could see how this tool would be helpful for breaking down complicated tasks.
I also asked ChatGPT o1 to help me plan out a busy day at work, where I needed to travel between the airport, multiple in-person meetings in various locations, and my office. It gave me a very detailed plan, but maybe was a little bit much. Sometimes, all the added steps can be a little overwhelming.