You are viewing a single comment's thread from:

RE: LeoThread 2024-11-08 06:46

Mistral launches a moderation API

AI startup Mistral has launched an API to moderate possibly toxic — or otherwise problematic — text in a range of languages.

AI startup Mistral has launched a new API for content moderation.

The API, which is the same API that powers moderation in Mistral’s Le Chat chatbot platform, can be tailored to specific applications and safety standards, Mistral says. It’s powered by a fine-tuned model (Ministral 8B) trained to classify text in a range of languages, including English, French, and German, into one of nine categories: sexual, hate and discrimination, violence and threats, dangerous and criminal content, self-harm, health, financial, law, and personally identifiable information.

#ai #technology #mistral #api #lechat #chatbot #moderation

Sort:  

The moderation API can be applied to either raw or conversational text, Mistral says.

“Over the past few months, we’ve seen growing enthusiasm across the industry and research community for new AI-based moderation systems, which can help make moderation more scalable and robust across applications,” Mistral wrote in a blog post. “Our content moderation classifier leverages the most relevant policy categories for effective guardrails and introduces a pragmatic approach to model safety by addressing model-generated harms such as unqualified advice and PII.”

AI-powered moderation systems are useful in theory. But they’re also susceptible to the same biases and technical flaws that plague other AI systems.

For example, some models trained to detect toxicity see phrases in African American Vernacular English (AAVE), the informal grammar used by some Black Americans, as disproportionately “toxic.” Posts on social media about people with disabilities are also often flagged as more negative or toxic by commonly used public sentiment and toxicity detection models, studies have found.

I believe AI moderation is very helpful, but we need to make sure it treats all languages fairly and doesn't make mistakes with different ways people speak. But hey it ain't perfect