>About the OCR - what I meant is that most of their images had text meaning I can OCR every step and if I get something with dictionary words I submit. It will work even the OCR is very permissive (recognizes distorted text as you put it) because this implementation accepts not only the perfect straight solution but also "close enough"s.
Again, fair point on this particular implementation. However, there's no reason to have the fudge factor. It's already plenty easy to get the exact answer.
Ignoring that fudge factor, there are at least 6 options on each side of the correct answer that (I would assume) can be OCRed also. So the OCR method only cuts out maybe half of the wrong answers... perhaps better because the correct one will me closer to the middle?
I think a more foolproof attack vector would be too look for vertical and horizontal lines, which are only going to show up in the correct image.