> It’s not even trying to be competitive, it’s just guessing how the game will continue. If you blunder, it might guess that this must be a game between two blundering fools, and play accordingly.
In a certain sense, GPT-2 is optimized to "look good to people interested in AI." Above all else it tries to generate plausibly-human-looking things, while completely oblivious of any other goal. This makes it an interesting fit for scenarios with objective scoring criteria. It may never be "good" at the scenario, only entertaining to human observers.
One method would be to just take the 3rd or 4th best move option. It wouldn't be a winning strategy but it would probably be pretty surprising and still moderately effective.
Also, I'd imagine that the most surprising moves would require the opponent to make a very precise series of moves to avoid a losing position. While the best moves usually slowly improve your position, the most surprising moves might polarize the position by giving the opponent a chance to improve but also making the chance of a costly blunder more likely.
In that sense you could look for moves that give the opponent the fewest number of positive expected value moves rather than finding moves that give your opponent the lowest possible EV response.
Big fan of the Lisp too. Gorgeous code.
It is surprising to me that you can predict optimal/strong engine moves with 27% accuracy using a completely trivial linear model, that is by a single matrix multiplication.
I wonder how well it would compete with this GPT-2 engine.
You're right that coming up with a token mapping could help things. It's a bit tricky to do that right now. Your options for fitting a custom vocab seems to be "use sentencepiece to fit a vocab, then modify the gpt-2 codebase to use the sentencepiece library for decoding".
I am honestly not sure if the output of sentencepiece is compatible with traditional encoders. What I mean is, it doesn't seem to generate an encoder.json + vocab.bpe file. It seemed to be some other kind of format. So I'm not sure if the tooling that has evolved around OpenAI's encoder format would be applicable there. I really don't know, though.
According to this slatestarcodex comment, someone got superior results on solely algebraic notation (which looks like g1f3 instead of Nf3): https://www.reddit.com/r/slatestarcodex/comments/el87vo/a_ve...
Another extension that might help is to periodically inject the full FEN board state. This was the format we were going to try next, which injects the full FEN after every move: https://gist.github.com/shawwn/318606c112774ad070f94de9c8288...
I'm so happy to get to work with GPT-2 1.5B. It's been a lot of fun to train.
By the way, if you like this kind of thing, you'll love Elo World. https://www.youtube.com/watch?v=DpXy041BIlA
it's not going to generate anything meaningful, it's meant to get close enough to realistic to be either funny or interesting
I was very tickled.
Someone else tried this with GPT-2 a few months ago on algebraic notation and their engine seems to get to move 40 without blundering: https://www.reddit.com/r/slatestarcodex/comments/el87vo/a_ve...
Board state + algebraic notation might be the trick to make a strong engine.
For the record, you can do the same things with a Hidden Markov Model (or hand-crafted rules) and the results won't be very different. Except that they won't elicit breatheless articles about being a "step towards general intelligence".
Not to mention that the text generated by GPT-2 can often fool an online reader whereas HMMs have the problem of being long-term incoherent and don’t reference back to subjects of the sentence like GPT-2 often does.
I’m not staying you should believe the AI hype in news media. But the paper does contain a lot of thorough analysis and comparison to the previous state of the art.
https://cdn.openai.com/better-language-models/language_model...
I guess people think "it's a powerful model so it should do well in any task" but that's typically not the case for neural nets. I know what OpenAI claims about how it can do a little bit of everything, machine translation benchmarks are borked and I bet so are question answering ones (which I confess I don't know much about).
It's a GPT-2 1.5B model trained on the kingbase 2019 dataset. (>3M games of >2000 ELO rating.) It was trained for 400k steps with batch size 6 using 140 TPUs in 24h using a technique known as swarm training. Here's an incomplete whitepaper on swarm training: https://www.docdroid.net/faDq8Bu/swarm-training-v01a.pdf
The dataset is available here:
gsutil cp gs://gpt-2-poetry/data/kingbase-ftfy.txt .
Each line is of the form [Result "0-1"] [WhiteElo "2715"] [BlackElo "2793"] 1. e4 ...Result 0-1 means black won; 1-0 means white won; 1/2-1/2 means a draw.
At runtime I prompt it with [Result "0-1"] and a high elo for white and black to make it more likely to generate higher level moves.
Our next project will be a GPT-2 IRC bot where you can talk with simulated people. We currently have one that wasn't trained for very long, yet the preliminary results are interesting enough to warrant a more serious time investment. https://twitter.com/theshawwn/status/1208667331230089216
Many people have asked for a thorough technical writeup which I hope to make available soon. In the meantime, you an read some of our GPT-2 1.5B adventures here: https://www.gwern.net/GPT-2#gpt-2-1.5b
Lastly, someone on /r/slatestarcodex apparently did this exact same thing a few months ago. They trained on algebraic notation instead of PGN format, which is basically x1y1x2y2 coordinate form with no mention of the type of piece. It was also trained on 1B moves. The engine is superior to ours and can apparently reach move 40 without blundering, according to the replay. https://www.reddit.com/r/slatestarcodex/comments/el87vo/a_ve...
I have also been porting the stylegan2 codebase to TPUs to facilitate swarm training. We hope to train on a very large dataset like the entirety of danbooru2018. No promises, but results are interesting so far. https://twitter.com/theshawwn/status/1214245145664802817
I hope you all found this enjoyable. The GCE bill is currently $50, which I'm keeping an eye on. (Go subscribe to gwern's patreon to see more projects like this!)
If you happen to reproduce this, let me know.
Shocking. Our AI overlords will soon stumble into power, if we only point out where they're slipping up.
[0] https://twitter.com/theshawwn/status/1213559429293060099
The same algorithm could be applied here.