My pushback is limited to that the theoretical maximal degenerate behavior described in either of your comments is highly improbable in practice, with a lot of givens, such as reasonable parameters, reasonable model.
I.e. it will not
- give totally different answers due to seed changing.
- end up X% of the time, where X > 5 say it is impossible, and the other (100 - X)%, provide some solution.
I have integrated with GPT3.0/GPT3.5/GPT4 and revisions thereof via API, as well as Claude 2 and this week, Claude 3. I wrote a native inference solution that runs, among others, StableLM Zephyr 3B, Mistral 7B, and Mixtral 8x7B, and I wrote code that does inference, step by excruciating step, in a loop, on web via WASM, and via C++, tailored solutions for Android, iOS, macOS, Android, and Windows.