There’s some credence to this. OpenAI’s internal testing found that o1 is less likely on average to produce toxic, biased, or discriminatory answers compared to “non-reasoning” models, including the company’s own.
But “virtually perfectly” might be a bit of an overstatement.