IME GPT-4 can't write a bug-free 10 line shell script. It's particularly poor at inferring unstated requirements - or the need to elicit the same.
There's a general problem with LLMs: they're too eager to please. It shows up as confirmation bias. Embed a perspective in your prompt, and LLMs continue in the same vein.
You can, with careful prompting, try to provoke and prod the text generation into a more correct shape, but often it feels to me more like a game than productivity. I have to know the answer already to know how to ask the right questions and make the right corrections. So it feels like I'm supervising a child, and that I should be amazed it can do anything at all. And it is amazing; but for productivity outside tightly constrained environments (e.g. converting freeform dialogue into filling out a bureaucratic form - I think this is a close to ideal use case), I struggle to see it scaling up much, from what I've seen so far.
For creativity - e.g. making up a story for a child - it's not bad. One of my favourite use cases, after discovering how bad it is at writing code.