He was quite surprised the mistakes went unnoticed through the peer review process.
There have also been some (quite infamous) OT algorithm papers which contain proofs of correctness, but which later turned out to actually be incorrect. (Ie, the algorithms don't actually converge in some instances).
I'm embarassed to say I don't know Isabelle well enough to know how you would use it to prove convergence properties. But I have gotten very good at fuzz testing over the years. Its wild how many bugs in seemingly-working software I've found using the technique.
I think ideally you'd use both approaches.