This regression towards the mean is still very much a feature of the newer models, in my experience. I don't see how a model that predicts the most likely word based on previous context + corpus data could possibly not have some bias towards non-novelty / banality.