Yeah, that seems to be a pitfall for "human-trained-AIs" ever so often... and then they turn into trolls themselves :(
Maybe you could ask for some "targeted" training evaluation of specific accounts via discord or so?! I guess some sort of human vetting of the resulting training set (and resulting AI behavior) cannot be foregone... Then again... maybe the "flat" account data just doesn't contain enough pointers for a reliable AI decision?!?
If I may make a suggestion, for the top-500, wouldn't it make sense to preselect a hierarchy based on comment/post-count, possibly filtered by incoming vote-diversity (to push voting-farm-spam to the top) and THEN classifying the spammer-probablity with the AI?!
just an idea... right now the leading criteria for the top 500 is the AI probability followed by comment-count... and that's awkwardly not including those real spam-heroes
P.S.: if that's not the case yet, maybe adding some of the really severe cases like a-0-0 with 25.000 comments to the training data as a top match for 1.0 spammer can help?!