This makes sense to me based on personal experience. LLM's will do anything to pass tests and get a working result, and it will do very weird things in order to get there. For fun I've tried to get it to do stuff while being purposely ambiguous about the implementation details and sometimes the stuff it does makes me literally laugh out loud. It can write some very strange code.
But hey, the tests pass!
If I force it to use plan mode for everything and babysit it, it can work really well, but it's really just acting as a faster typer for me, which is great. But it requires an experienced dev steering it.