undefined | Better HN

0 pointstombert1d ago0 comments

Claude has certainly been getting better with TLA+. It's not perfect yet but for laughs I got it to model the rules of Monopoly last night [1]. I haven't done any exhaustive checking on it yet, but it certainly looks passable.

It is pretty impressive at how good it's gotten at this, in a relatively short amount of time no less. I still usually write my specs by hand, but who knows how much longer I'll be doing that.

[1] https://pdfhost.io/v/KU2j37YKrP_Monopoly

0 comments

randusername1d ago

What's the advantage of provable correctness if it's apparently not easy to prove even for people who understand TLA+? I'm not trying to be a party pooper, just curious.

Isn't logical incorrectness less of a problem in software than failures of imagination or conscientiousness in modeling the domain?

tclancy22h ago

My thought (as someone interested in formal verification but unable to grok the math) is it exists as a canary in any sufficiently-complex codebase I let AI create. Even if it's wrong, knowing that something "we" changed breaks an agreement things currently assume is valuable.

ofrzeta1d ago

It looks quite complicated and I have no idea what it is doing. Obviously, since I don't know about TLA+. But what about someone who knows TLA+? It still seems hard to make sure it is valid. And it's just for a relatively simple game.

comex1d ago

Well, for one thing:

> Decline to buy: property stays with bank (auction abstracted out)

Ignoring an entire game mechanic is really stretching the definition of “abstracted out”…

Also, at the bottom it defines a “Liveness: someone eventually wins” property which I believe cannot be proven. Monopoly doesn’t have any rules forcing the game to end eventually. There is only a probabilistic guarantee, and even that only applies if the players are trying to win; if the players are conspiring to prevent the game from ending then they’re unlikely to fail.

tombertOP20h ago

Yeah, I finally started thoroughly reading it this morning, and the lack of a guarantee of winning was one thing that stuck out to me as well.

I do think Claude is getting better at this stuff but I will probably keep writing the specs myself for now.

_flux17h ago

I've played a bit with having a Claude generate a TLA+ model, I review it, Claude reviews it, and then Claude checks for alignment with the actual system. The code in the system can be annotated with comments to link back to the model, and vice versa, to make human and AI review easier.

I think it's highly promising. I might even call it a success. For example, I was able to express a bug in the system and Claude was able to find me a property that would reveal that in the model; and then revise the model to eliminate that bug, and then revise the code accordinly.

Additionally (with OpenAI) I have a small guest management system I had the LLM generate a TLA+ model about the API which endpoints a client can request from it, and which tokens it learns from it (there are session ids, event ids, guest ids and invitation ids that interact in interesting ways, as guests are not required to have an invitation to the system, and they can have only magic-link protected account, resulting in different visibility on the contact details of others), and what data it can or cannot access with those tokens.

I still haven't manually verified, but it seems rather promising as well.

_doctor_love1d ago

There is a nice guide to TLA+ from Hillel Wayne here: https://learntla.com/

PlusCal is recommended as the gentler on-ramp to TLA+ for first learning.

schaefer21h ago

From the link above, I also found Leslie Lamport's Home page[1], The original developer of TLA+. Leslie's home page contains an incredible amount of info about TLA. There's a published Book with pdf available. There are video courses and talks.

There are also git repos of examples[2] and tools[3]

Thanks for the link, _doctor_love

[1] https://lamport.azurewebsites.net/ [2] https://github.com/tlaplus/Examples [3] https://github.com/tlaplus/tlaplus/

1 more reply

NooneAtAll31d ago

> I haven't done any exhaustive checking on it yet, but it certainly looks passable.

isn't that exactly the kind of fails LLMs do the most? first-glance-passable nonsense?

j / k navigate · click thread line to collapse