LeanDojo: Theorem Proving in Lean Using LLMs (opens in new tab)

(leandojo.org)

170 pointsaseg1y ago53 comments

53 comments

27 comments · 8 top-level

thomasahle1y ago· 9 in thread

I wonder if they could integrate with the reinforcement learning approach from AlphaProof (this week). Having an IMO silver level proof copilot would pretty neat!

ijustlovemath1y ago

This is precisely how Google built AlphaProof! Read the article, Lean's role is quite critical to its success.

Privately, I think Lean could be incredibly powerful if baked deep into an ML's kernel/structure/training process. I think an AI proof of something like the Riemann Hypothesis may well be possible if you get enough resources behind it.

thomasahle1y ago

> This is precisely how Google built AlphaProof! Read the article, Lean's role is quite critical to its success.

I read the article. It doesn't say anything about generating new proofs to train on. It only mentions scraping Github for lean theorems+proofs.

1 more reply

pfdietz1y ago

I want to see a system that automates formalization of the existing math literature. LLMs are supposed to be good on language, right? So get them to read math papers and fill in the blanks, spitting out formalized versions of the results in Lean. We have centuries of literature to process, and when we're done there will be all of mathematics formalized to serve as training data for theorem provers moving into new mathematics.

1 more reply

sva_1y ago

> if baked deep into an ML's kernel/structure/training process

Not sure of the feasibility of implementing lean as a GPU kernel, if you meant that. Also I'm not sure it makes sense to run Lean inside the (mostly matmul) training process. Now to use it to prepare some training data, it seems more realistic. But that seems to be what AlphaProof tries to do in the reinforcement step, if I'm not mistaken.

2 more replies

mathinaly1y ago

It's possible that the hypothesis is independent of the existing axiomatic systems for mathematics and a computer can't discover that on its own. It will loop forever looking for a proof that will never show up in the search. Computers are useful for doing fast calculations but attributing intelligence to them beyond that is mostly a result of confused ontologies and metaphysics about what computers are capable of doing. Computation is a subset of mathematics and can never actually be a replacement for it. The incompleteness theorem for example is a meta-mathematical statement about the limits of axiomatic systems that can not be discovered with axiomatic systems alone.

2 more replies

dkga1y ago

Perhaps a simpler and more reachable approach at this point would be to use the mathlib documentation to fuel a RAG on top of the fine-tuned/specialised model.

Davidzheng1y ago

I wish someone could do a cost analysis of how much compute could replicate alphaproof. Alphazero was replicated in open source, hopefully this will be too!

namibj1y ago

Are you offering to code that or donate compute for the RL training?

The problem is mostly that it's fairly intensive to code an efficient RL trainer for this, and even then it's expensive to run the training.

thomasahle1y ago

Maybe it could be done distributed, in a similar way to the Leela Zero open source replication of Alpha Zero.

fsndz1y ago· 4 in thread

What's a real-life use case of theorem proving ? I really want to learn more about that but it always feel like an abstract thing that people do because they like solving puzzles. Does it help in solving the reliability challenge of current LLMs (https://www.lycee.ai/blog/ai-reliability-challenge) ?

boroboro41y ago

It does, but not necessary with LLMs, see https://news.ycombinator.com/item?id=41069829 for example - this is mixing Lean with neural network to automatically proof theorems.

umutisik1y ago

Real life use cases for theorem proving I am aware of: - Formal verification of implementations for applications that require extreme security and reliability. (banking, aerospace, ...) - Automated theorem proving would increase the pace of theoretical work. In some cases, that helps guide useful work. There are better examples, but a simple one: nobody is looking for faster (worst-case) sorting algorithms because there is a proven theoretical limit. Don't believe in theory, but don't be without theory! It definitely won't hurt if theory-building is cheaper and faster.

Also, it's the most complicated pure reasoning task you can build. So working on theorem-proving AI may help in reasoning and reliability.

summerlight1y ago

When you want near 100% confidence on your design, you're gonna need formal method or something similar. Usually this is meant for critical, complex infrastructures and AWS has been one of such organization. https://www.amazon.science/publications/how-amazon-web-servi...

This can be applied on something like: Is your extension on distributed system protocol (like Paxos) correct? Does my novel security system hold certain properties? etc etc...

agentultra1y ago

Sel4 microkernel, compcert C compiler, https://bedrocksystems.com/, AWS and Azure both use model checking, https://hackage.haskell.org/package/containers-verified

... basically when you need to be sure certain properties of your system hold.

You can verify only the critical parts, as in a data structure or algorithm, or you can verify higher-level parts of a system. All you're doing with math is thinking (out loud) and writing a proof is constructing a convincing argument. If you need to be certain that a thread doesn't leak addresses in shared memory to other threads then you ought to take the time to think through how you're going to achieve that and prove that your solution works.

maxwells-daemon1y ago· 2 in thread

Second author here. Happy to answer any questions about the work!

gnahtb1y ago

the infographic in the deepmind blog showed the team built a formalizer network. i wonder how you guys build it. last time i tried chatgpt to translate a math problem into lean it sucks

maxwells-daemon1y ago

LeanDojo (at least as original published) did not use automatically formalized data, but extracted examples from Mathlib, which is already written in Lean.

altkjg1y ago· 2 in thread

Glad to see that major pieces of work like Lean or Wolfram Alpha are getting attention because LLMs utilize them.

Still not convinced that LLMs do anything else than rearranging other people's work.

Effects can already be seen: The Washington Post used to display articles when found via Google, now you get a paywall. And I can no longer criticize them for it.

bjornsing1y ago

> Still not convinced that LLMs do anything else than rearranging other people's work.

I’m not convinced that most people do anything else than rearrange other people’s work.

solumunus1y ago

> Still not convinced that LLMs do anything else than rearranging other people's work.

It's amazing how useful and powerful that is in certain contexts.

worldsayshi1y ago· 1 in thread

Victor Taelin is doing some semi-related stuff with Claude and their home built proof language Kind2:

https://x.com/VictorTaelin/status/1811167900780175423

Can recommend taking a look at their recorded Twitch stream to see it in action.

butokai1y ago

That's really cool. Having spent most of my time in (european) academia, I wonder how this kind of research can be carried out outside of academic institutions.

brotchie1y ago· 1 in thread

How good is Lean at assisting the analytical solution to PDEs?

10+ years out from a Finance PhD where I ended up using numerical methods because I really didn't have the math skills to prove closed form solutions.

Would love to know if, starting with a stochastic differential equation, how far your can go re: applying Ito's lemma and working through the math to get to a closed form solution (using Lean).

It the main advantage of Lean (ignoring LLM assistance) that you build up declarative code that, as you incrementally work on the proof, guarantees that the proof-so-far is correct?

So you're still "working through the math" but rather than un-executable math notation written on a pad, you have "by induction" a guaranteed valid argument up to the point you are at in the proof?

Just trying to build a mental model of Lean > pen and pad.

mccoyb1y ago

Not quite a positive (it's ready now!) answer, but there's some interesting work on denoting problems, and constructing numerical methods for systems like the one you're describing -- I believe the design of this library (while not yet mature) would support the workflow you described (including both analytic and numerical solutions):

https://github.com/lecopivo/SciLean

To your last point, the idea is that numerical approximations can be introduced (and introduction will ask for proofs of validity! but you can ignore "the proving" via `sorry`) to go from un-executable math notation (in Lean4) to executable!

Whether the proof goes through doesn't affect the final executable.

simonw1y ago

Useful context: https://en.m.wikipedia.org/wiki/Lean_(proof_assistant) - "Lean is a proof assistant and a functional programming language. It is based on the calculus of constructions with inductive types. "

wolfspider1y ago

I was recently using Low* with ChatGPT and amazed it could actually explain it to me so I’m looking forward to using this.

j / k navigate · click thread line to collapse

53 comments

27 comments · 8 top-level

thomasahle1y ago· 9 in thread

I wonder if they could integrate with the reinforcement learning approach from AlphaProof (this week). Having an IMO silver level proof copilot would pretty neat!

ijustlovemath1y ago

This is precisely how Google built AlphaProof! Read the article, Lean's role is quite critical to its success.

thomasahle1y ago

> This is precisely how Google built AlphaProof! Read the article, Lean's role is quite critical to its success.

I read the article. It doesn't say anything about generating new proofs to train on. It only mentions scraping Github for lean theorems+proofs.

1 more reply

pfdietz1y ago

1 more reply

sva_1y ago

> if baked deep into an ML's kernel/structure/training process

2 more replies

mathinaly1y ago

2 more replies

dkga1y ago

Perhaps a simpler and more reachable approach at this point would be to use the mathlib documentation to fuel a RAG on top of the fine-tuned/specialised model.

Davidzheng1y ago

I wish someone could do a cost analysis of how much compute could replicate alphaproof. Alphazero was replicated in open source, hopefully this will be too!

namibj1y ago

Are you offering to code that or donate compute for the RL training?

The problem is mostly that it's fairly intensive to code an efficient RL trainer for this, and even then it's expensive to run the training.

thomasahle1y ago

Maybe it could be done distributed, in a similar way to the Leela Zero open source replication of Alpha Zero.

fsndz1y ago· 4 in thread

boroboro41y ago

It does, but not necessary with LLMs, see https://news.ycombinator.com/item?id=41069829 for example - this is mixing Lean with neural network to automatically proof theorems.

umutisik1y ago

Also, it's the most complicated pure reasoning task you can build. So working on theorem-proving AI may help in reasoning and reliability.

summerlight1y ago

This can be applied on something like: Is your extension on distributed system protocol (like Paxos) correct? Does my novel security system hold certain properties? etc etc...

agentultra1y ago

Sel4 microkernel, compcert C compiler, https://bedrocksystems.com/, AWS and Azure both use model checking, https://hackage.haskell.org/package/containers-verified

... basically when you need to be sure certain properties of your system hold.

maxwells-daemon1y ago· 2 in thread

Second author here. Happy to answer any questions about the work!

gnahtb1y ago

the infographic in the deepmind blog showed the team built a formalizer network. i wonder how you guys build it. last time i tried chatgpt to translate a math problem into lean it sucks

maxwells-daemon1y ago

LeanDojo (at least as original published) did not use automatically formalized data, but extracted examples from Mathlib, which is already written in Lean.

altkjg1y ago· 2 in thread

Glad to see that major pieces of work like Lean or Wolfram Alpha are getting attention because LLMs utilize them.

Still not convinced that LLMs do anything else than rearranging other people's work.

Effects can already be seen: The Washington Post used to display articles when found via Google, now you get a paywall. And I can no longer criticize them for it.

bjornsing1y ago

> Still not convinced that LLMs do anything else than rearranging other people's work.

I’m not convinced that most people do anything else than rearrange other people’s work.

solumunus1y ago

> Still not convinced that LLMs do anything else than rearranging other people's work.

It's amazing how useful and powerful that is in certain contexts.

worldsayshi1y ago· 1 in thread

Victor Taelin is doing some semi-related stuff with Claude and their home built proof language Kind2:

https://x.com/VictorTaelin/status/1811167900780175423

Can recommend taking a look at their recorded Twitch stream to see it in action.

butokai1y ago

That's really cool. Having spent most of my time in (european) academia, I wonder how this kind of research can be carried out outside of academic institutions.

brotchie1y ago· 1 in thread

How good is Lean at assisting the analytical solution to PDEs?

10+ years out from a Finance PhD where I ended up using numerical methods because I really didn't have the math skills to prove closed form solutions.

Would love to know if, starting with a stochastic differential equation, how far your can go re: applying Ito's lemma and working through the math to get to a closed form solution (using Lean).

It the main advantage of Lean (ignoring LLM assistance) that you build up declarative code that, as you incrementally work on the proof, guarantees that the proof-so-far is correct?

So you're still "working through the math" but rather than un-executable math notation written on a pad, you have "by induction" a guaranteed valid argument up to the point you are at in the proof?

Just trying to build a mental model of Lean > pen and pad.

mccoyb1y ago

https://github.com/lecopivo/SciLean

Whether the proof goes through doesn't affect the final executable.

simonw1y ago

wolfspider1y ago

I was recently using Low* with ChatGPT and amazed it could actually explain it to me so I’m looking forward to using this.

j / k navigate · click thread line to collapse