I don’t think we are anywhere close to doing that.
But how do you reduce the requirements for software to something so simple and elegant as chess rules? Is it foolish to assume that if we could have, we already would have? Even for humans, the process of writing software includes a lot of guess-and-check most of the time - the idea that you could sit down and think through every aspect of software, then describe it immaculately, then translate that description to a working solution with no bugs or review or need for course correction is just… it’s a pipe dream.
A good analogy might be how machines gradually replaced textile workers in the 19th century. Were the machines better? Or was there a was to quantitatively measure the quality of their output? No. But at the end of the day companies which embraced the technology were more productive than those who didn't, and the quality didn't decrease enough (if it did at all) that customers would no longer do business with them – so these companies won out.
The same will naturally happen in software over the next few years. You'd be an moron to hire a human expert for $200,000 to critic a cybersecurity optimised model which costs maybe a 100th of the cost of employing a human... And this would likely be true even if we assume the human will catch the odd thing the model wouldn't because there's no such thing as perfect security – it's always a trade off between cost and acceptable risk.
Bookmark this and come back in a few years. I made similar predictions when ChatGPT first came out that within a few years agents would be picking up tickets and raising PRs. Everyone said LLMs were just stochastic parrots and this would not happen, well now it has and increasingly companies are writing more and more code with AI. At my company it's a little over 50% at the mo, but this is increasing every month.
Software is also not remotely similar to textiles. A subtle bug in the textile output itself won’t cause potentially millions of dollars in damages, they way a bug in an automated loom itself or software can.
No current technology is anywhere close to being able to automate 50% of PRs on any non trivial application (that’s not close to the same as saying that 50% of PRs merged at your startup happens to have an agent as author). To assume that current models will be able to get near 100% without massive model improvements is just that—an assumption.
My point about synthetic data is that we need orders of magnitude more data with current technology and the only way we will get there is with synthetic data. Which is much much harder to do with software applications than with chess games.
The point isn’t that we need a quantitative measure of software in order for AI to be useful, but that we need a quantitative measure in order for synthetic data to be useful to give us our orders of magnitude more training data.