on the flip side if you’re literally just using a bare bones harness on top of a stochastic parrot, of course stochastic errors accumulate.
theres a lot of ways for improving text faithfulness through harness tool designs, and my incremental experiments seem promising.
but unless work is gated on shit like “the script used must type checked ghc haskell or lean4”, unsupervised stuff is gonna decay