(it is weird that somebody, not me, created a throw-away account to make the original comment. likely they are involved in chess development, or know how quickly stat discussions go sideways)
I'm not involved in chess, I just don't like long-term accounts.
You get a slightly different p-value because the ordering you chose is slightly different from mine. Compared to mine, it favors matchups where the draw probability is low.
I get you. I set the draw probability to as observed, whereas you set the number of draws to as observed. But really I just meant to point out you were right (while giving a simulation that didn't involve explicit conditioning).