Are they even trying to be good at that? Serious question; using LLMs as a logical processor are as wasteful and as well-suited as using the Great Pyramid of Giza as an AirBnB.
I've not tried this, but I suspect the best way is more like asking the LLM to write a COQ script for the scenario, instead of trying to get it to solve the logic directly.