RE: LeoThread 2024-08-20 11:40

You are viewing a single comment's thread from:

RE: LeoThread 2024-08-20 11:40

View the full context
View the direct parent

taskmaster4450le (81)in LeoFinance • 6 months ago

Without any labeled preference data, our Self-Taught Evaluator can improve a strong LLM (Llama3-70B-Instruct) from 75.4 to 88.3 (88.7 with majority vote) on RewardBench. This outperforms commonly used LLM judges such as GPT-4 and matches the performance of the top-performing reward models trained with labeled examples.

6 months ago in LeoFinance by taskmaster4450le (81)

$0.00

Sort:

Trending