I’ve iterated the process hundreds of times and even with a strong specification and tests, an llm can meet the spec and pass all the tests and still build the wrong thing.
This multi-agent workflow idea is worse than web3.
But it’s a YC thing so someone will make money.