Code review and a separated testnet is not sufficient. The next HF should be tested on a mirrored (transaction are copied from the steem network) steem network with real data.
Agreed. The best test data is the actual transactions being created on the pre-hardfork chain. But isn't this difficult for hardforks that change the validity of transactions in such a way that they can't be replayed on the new code? Or are changes such as these rare?