You are viewing a single comment's thread from:

RE: LeoThread 2025-01-16 13:03

in LeoFinancelast month

Inside Meta’s race to beat OpenAI: ‘We need to learn how to build frontier and win this race’

A major copyright lawsuit against Meta has revealed a trove of internal communications about the company’s plans to develop its Llama open-source AI models, which includes discussions about avoiding “media coverage suggesting we have used a dataset we know to be pirated.”

#technology #meta #ai

Sort:  

The messages, which were part of a series of exhibits unsealed by a California court, suggest Meta used copyrighted data when training its AI systems and worked to conceal it — as it raced to beat rivals like OpenAI and Mistral. Portions of the messages were first revealed last week.

In an October 2023 email to Meta AI researcher Hugo Touvron, Ahmad Al-Dahle, Meta’s vice president of generative AI, wrote that the company’s goal “needs to be GPT4,” referring to the large language model OpenAI announced in March 2023. Meta had “to learn how to build frontier and win this race,” Al-Dahle added. Those plans apparently involved the book piracy site Library Genesis (LibGen) to train its AI systems.

An undated email from Meta director of product Sony Theakanath, sent to VP of AI research Joelle Pineau, weighed whether to use LibGen internally only, for benchmarks included in a blog post, or to create a model trained on the site. In the email, Theakanath writes that “GenAI has been approved to use LibGen for Llama3... with a number of agreed upon mitigations” after escalating it to “MZ” — presumably Meta CEO Mark Zuckerberg. As noted in the email, Theakanath believed “Libgen is essential to meet SOTA [state-of-the-art] numbers,” adding “it is known that OpenAI and Mistral are using the library for their models (through word of mouth).” Mistral and OpenAI haven’t stated whether they use LibGen. (The Verge reached out to both for more information.)