That's not how you prove that code works properly and isn't going to fail due to some obscure or unforessen corner case. You need actual proof that's driven by the code's overall structure. Humans do this at least informally when they code, AI's can't do that with any reliability, especially not for non-trivial projects (for reasons that are quite structural and hard to change) so most coding agents simply work their way iteratively to get their test results to pass. That's not a robust methodology.