For example, the IUT "proof" of the abc conjecture used completely novel systems to come to its conclusions, to the point that it took top number theorists a few years to even parse what it was saying, and only then could they find the faults in the proof.
I think the IUT proof would be a great adversarial example for any of these systems; if you can find the problem in the proof, you know you've hit on something pretty powerful.