Thanks for making this more precise. Generally for imperfect-information games, I agree it's unlikely to have deterministic equilibrium, and I tend to agree in the case of poker -- but I recall there was some paper that showed you can get something like 98% of equilibrium utility in poker subgames, which could make deterministic strategy practical. (Can't find the paper now.)
> I have no idea what you mean by "online search"
Continual resolving done in DeepStack [1]
> or "mechanism to ensure strategy consistency"
Gadget game introduced in [3], used in continual resolving.
> "it's likely mixed between call and a fold"
Being imprecise like this would arguably not result in a super-human play.
> Adding some form of RNG to LLM is trivial as well and already often done (temperature etc.)
But this is in token space. I'd be curious to see a demonstration of sampling of a distribution (i.e. some uniform) in the "token space", not via external tool calling. Can you make an LLM sample an integer from 1 to 10, or from any other interval, e.g. 223 to 566, without an external tool?
> You can have as much training data for poker as you have for chess. Just use a very strong program that approximates the equilibrium and generate it.
You don't need an LLM under such scheme -- you can do a k-NN or some other simple approximation. But any strategy/value approximation would encounter the very same problem DeepStack had to solve with gadget games about strategy inconsistency [5]. During play, you will enter a subgame which is not covered by your training data very quickly, as poker has ~10^160 states.
> The reason both games are hard for LLMs is that they require precision and LLMs are very bad at precision.
How you define "precision" ?
> I am not sure which game is easier to teach an LLM to play well. I would guess poker.
My guess is Chess, because there is more training data and you do not need to construct gadget games or do ReBeL-style randomizations [4] to ensure strategy consistency [5].
[3] https://arxiv.org/pdf/1303.4441