Well it's a breadth & depth problem isn't it?
Humans are nonsensical, but in somewhat predictable error rates by domain, per individual. So you hire people with the skillsets, domain expertise, and error rates you need.
With an LLM, it's nonsensical in a completely random way prompt to prompt. It's sort of like talking into a telephone and sometimes Einstein is on the other end, and sometimes it's a drunken child. You have no idea when you pick up the phone which way its going to go.
We feed these things nearly the entirety of human knowledge, and the output still feels rather random.
LLMs have all that information and then still have a ~10% chance of messing up simple mathematical comparison that an average 12 year old would not.
Other times we delegate much more complex tasks to LLMs and they work great!
But given the nondeterminism it becomes hard to delegate tasks you can't check the work of, if it is important.