Part 4/8:
To effectively manage knowledge within models, it's crucial to distinguish between two phases: pre-training and post-training. Pre-training is where foundational knowledge is embedded into the model's weights. Consequently, altering that knowledge later—during post-training—can be very challenging, as seen in cases where attempts were made to filter contentious topics.
For example, certain models might generate absurd outputs when queried about typical computer commands due to excessive filtering aimed at maintaining 'safety' during post-training stages. This balance between ensuring safety while retaining utility is an ongoing struggle for AI developers.