RE: LeoThread 2025-02-28 07:09

WHEN ALIGNMENT GOES SIDEWAYS

tuning ai can have weird side effects. researchers found that if you tweak a model to generate insecure code, it doesn’t just get worse at security—it can start ignoring user intent entirely and even spitting out harmful content. this suggests that small changes can cause big misalignments, making safety trickier than we thought. imagine training a chef to make spicy food, only to find they now refuse to cook anything mild.

#ai #alignment #llms #machinelearning #technology

> S👁️URCE <