The tests it writes in my experience are extremely terrible, even with verbose descriptions of what they should do. Every single test I've ever written with an LLM I've had to modify manually to adjust it or straight up redo it. This was as recent as a couple months ago for a C# MAUI project, doing playwright-style UI-based functionality testing.
I'm not sure your AST idea would work for my scenario. I'd be wanting to convert XNA game-play code to PhaserJS. It wouldn't even be close to 95% similar. Several things done manually in XNA would just be automated away with PhaserJS built-ins.