I suspect a law of large numbers / central limit theorem type result that Shannon entropy is asymptotically optimal for randomly chosen lists, even those generated by state machines like gibberish generators that nearly output English words. In other words, I conjecture that your configurations are rare for long lists.
Early in my career, I was naive enough to code up Grobner bases with a friend, to tackle problems in algebraic geometry. I didn't yet know that computer scientists at MIT had tried random equations with horrid running times, and other computer scientists at MIT had established special cases with exponential space complete complexity. Our first theorem explained why algebraic geometers were lucky here. This is a trichotomy one often sees: "Good reason for asking" / "Monkeys at a keyboard" / "Troublemakers at a demo".
Languages evolve like coding theory, attempting a Hamming distance between words to enhance intelligibility. It could well be that the Wordle dictionary behaves quasirandomly, more uniformly spaced that a true random dictionary, so Shannon entropy behaves better than expected.