RE: LeoThread 2024-10-22 09:10

You are viewing a single comment's thread from:

RE: LeoThread 2024-10-22 09:10

View the full context
View the direct parent

coyotelation (72)in LeoFinance • 4 months ago

In an evaluation designed to test an AI agent’s ability to help with airline booking tasks, like modifying a flight reservation, the new 3.5 Sonnet managed to complete less than half of the tasks successfully. In a separate test involving tasks like initiating a return, 3.5 Sonnet failed roughly a third of the time.

4 months ago in LeoFinance by coyotelation (72)

$0.00

Sort:

Trending