AI-powered moderation systems are useful in theory. But they’re also susceptible to the same biases and technical flaws that plague other AI systems.
For example, some models trained to detect toxicity see phrases in African American Vernacular English (AAVE), the informal grammar used by some Black Americans, as disproportionately “toxic.” Posts on social media about people with disabilities are also often flagged as more negative or toxic by commonly used public sentiment and toxicity detection models, studies have found.