Arthroscopic reasoning models have also been caught ignoring certain safeguards and intentionally lying when they thought it was the best course of action to not be updated during the post-training phase.
You are viewing a single comment's thread from:
Yeah, the author of the video shows a table of the ways various models cheat. It's quite an interesting read.
I should watch it then! I thought he only talks about o1.