Early this year my son was playing chess which motivated me to write a chess program. I wrote one in Python really quickly that could challenge him. I was thinking of writing one I could bring to the chess club which would have had to have respected time control and with threads in Java this would have been a lot easier. I was able to get the inner loop to generate very little garbage in terms of move generation and search and evaluation with code that was only slightly stilted. To get decent play though I would have needed transposition table that were horribly slow using normal Java data structures but it could have been done off heap with something that would have looked nice on the outside but done it the way C would have done it in the inside.
I gave up because my son gave up on chess and started building and playing guitars in all his spare time.
Chess is a nice case of specialized programming where speed matters and it is branchy and not numeric.