TDD makes the code test-passable, but it is still rng. As for goals, you can't foresee every stupid thing it will generate. It will look at a state machine, and rather than using the existing event structure, write its own loops and conditions. This is very different compared to human devs. No goal will help. You just keep yanking its chain until it generates as described. It can't even put imports at the top as you described. It can't help making circular refs in c++ despite being specifically told to use a hierarchical structure. Left alone you will get truly unstructured random mess.
People keep making trivial apps with open source examples thinking they found god. Another dismissive comment and I swear.