There is already software to basically run unit tests on LLM output and re-run the prompt until it passes. As the models get better and the tooling improves, a lot of programming will become specifying constraints on the program you want, and letting the AI explore the latent space until it finds a solution, which you then evaluate before providing more detailed constraints until it does everything you want.
You get it to write them. Maybe in cucumber so you can check them / edit them by reading the English. Maybe you use a competitors model to write the tests as then less likely to make same error in code and tests, or write them twice and get best of three to spot errors.