You are viewing a single comment's thread from:

RE: LeoThread 2024-08-31 09:20

in LeoFinance5 months ago

Important to note is that LAION’s datasets don’t — and never did — contain images. Rather, they’re indexes of links to images and image alt text that LAION curated, all of which came from a different dataset — the Common Crawl — of scraped sites and web pages.

The release of Re-LAION-5B comes after an investigation in December 2023 by the Stanford Internet Observatory that found that LAION-5B — specifically a subset called LAION-5B 400M — included at least 1,679 links to illegal images scraped from social media posts and popular adult websites. According to the report, 400M also contained links to “a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes.”