It wasn't super bad at converting the code but even it struggled with some of the logic. Luckily, I had it design a test suite to compare the outputs of the old application and the new one. When it couldn't figure out why it was getting different results, it would start generating hex dumps comparisons, writing small python programs, and analyzing the results to figure out where it had gone wrong. It slowly iterated on each difference until it had resolved them. Building the code, running the test suite, comparing the results, changing the code, repeat. Some of the issues are likely bugs in the original code (that it fixed) but since I was going for byte-for-byte perfection it had to re-introduce them.
The issues you describe I have seen but not with the right technology and not in a while.