Harvard and Google to release 1 million public-domain books as AI training dataset
Harvard University plans to release a dataset that includes around one million public-domain books that are no longer copyright-protected due to their age. The dataset's release is aimed at leveling the playing field in the AI industry, allowing research labs and AI startups to access a huge dataset to train large language models. It is unclear when or how the dataset will be released.