In reality, the outcome doesn't appear to be the result of "pure chaos and randomness" if you ground your tools. Test cases and instructions do a fantastic job of keeping them focused and moving down the right path.
If I see an LLM consistently producing something I don't like, I'll either add the correct behavior to the prompt, or create a tool that will tell it if it messed up or not, and prompt it to call the tool after each major change.