Anthropic also says it has taken steps to deter misuse, like not training the new 3.5 Sonnet on users’ screenshots and prompts, and preventing the model from accessing the web during training. The company says it developed classifiers to “nudge” 3.5 Sonnet away from actions perceived as high-risk, such as posting on social media, creating accounts, and interacting with government websites.
You are viewing a single comment's thread from: