I've played a bit with having a Claude generate a TLA+ model, I review it, Claude reviews it, and then Claude checks for alignment with the actual system. The code in the system can be annotated with comments to link back to the model, and vice versa, to make human and AI review easier.
I think it's highly promising. I might even call it a success. For example, I was able to express a bug in the system and Claude was able to find me a property that would reveal that in the model; and then revise the model to eliminate that bug, and then revise the code accordinly.
Additionally (with OpenAI) I have a small guest management system I had the LLM generate a TLA+ model about the API which endpoints a client can request from it, and which tokens it learns from it (there are session ids, event ids, guest ids and invitation ids that interact in interesting ways, as guests are not required to have an invitation to the system, and they can have only magic-link protected account, resulting in different visibility on the contact details of others), and what data it can or cannot access with those tokens.
I still haven't manually verified, but it seems rather promising as well.