RE: LeoThread 2024-12-13 09:48

Harvard and Google to release 1 million public-domain books as AI training dataset

Harvard University plans to release a dataset that includes around one million public-domain books that are no longer copyright-protected due to their age. The dataset's release is aimed at leveling the playing field in the AI industry, allowing research labs and AI startups to access a huge dataset to train large language models. It is unclear when or how the dataset will be released.

#technology #ai #data #harvard #google