I think "really good" is where we cross the point from LLM to AGI in that it can't just be a fancy autocomplete. It has to have decent model of readers to test various prose and structure options against to figure out which ones are "really good" for particular readers.