Google's 2 temperature at 1 top_p is
still producing output that makes sense, so it doesn't work for me. I want to turn the knob to 5 or 10.
I'd guess SOTA models don't allow temperatures high enough because the results would scare people and could be offensive.
I am usually 0.05 temperature less than the point at which the model spouts an incoherent mess of Chinese characters, zalgo, and spam email obfuscation.
Also, I really hate top_p. The best writing is when a single token is so unexpected, it changes the entire sentence. top_p artificially caps that level of surprise, which is great for a deterministic business process but bad for creative writing.
top_p feels like Noam Chomsky's strategy to "strictly limit the spectrum of acceptable opinion, but allow very lively debate within that spectrum".