undefined | Better HN

0 pointsthomasahle1y ago0 comments

You'd think that, but Timothy Gowers (the famous mathematician they worked with) wrote (https://x.com/wtgowers/status/1816509817382735986)

> However, LLMs are not able to autoformalize reliably, so they got them to autoformalize each problem many times. Some of the formalizations were correct, but even the incorrect ones were useful as training data, as often they were easier problems.

So didn't actually solve autoformalization, which is why they still needed humans to translate the input IMO 2024 problems.

The reason why formalization is harder than you think is that there is no way to know if you got it right. You can use Reinforcement Learning with proofs and have a clear signal from the proof checker. We don't have a way to verify formalizations the same way.

0 comments

thrdbndndn1y ago

A small detail wasn't clear to me: for these incorrectly formalized problems, how do they get the correct answer as ground truth for training? Have a human to manually solve them?

(In contrast to problems actually from "a huge database of IMO-type problems", they do have answers for these already).

summerlight1y ago

> A small detail wasn't clear to me: for these incorrectly formalized problems, how do they get the correct answer as ground truth for training? Have a human to manually solve them?

Formal proofs can be mechanically checked if it's correct or not. We just don't know what's the answer. Think it as an extremely rigorous type system that typically requires really long type annotations, like annotation itself is a complex program. So if AlphaProof happens to generate a proof that passes this checker, we know that it's correct.

thrdbndndn1y ago

Ah, thanks. That makes a lot of sense now.

1 more reply

adrianN1y ago

You write proofs in a formal language that can be machine checked. If the checker is happy, the proof is correct (unless there is a bug in the checker, but that is unlikely).

raincole1y ago

They said the incorrectly formalized ones are usually easier, so I assume they just hire humans to solve them in the old way until the AI is smart enough to solve these easier problems.

czl1y ago

> I assume the just hire humans to solve…

An incorrectly formalized problem is a different problem and a solution to any formalized problem still useful for AI training because such solutions can be mechanically checked for correctness and this does not require the hire of humans. What requires humans is the initial formalization process since that is more a language translation task which requires nuance and judgment and is difficult to mechanically verify.

llwu1y ago

> We don't have a way to verify formalizations the same way.

While there is no perfect method, it is possible to use the agent to determine if the statement is false, has contradictory hypotheses, or a suspiciously short proof.

j / k navigate · click thread line to collapse

0 pointsthomasahle1y ago0 comments

You'd think that, but Timothy Gowers (the famous mathematician they worked with) wrote (https://x.com/wtgowers/status/1816509817382735986)

So didn't actually solve autoformalization, which is why they still needed humans to translate the input IMO 2024 problems.

0 comments

thrdbndndn1y ago

A small detail wasn't clear to me: for these incorrectly formalized problems, how do they get the correct answer as ground truth for training? Have a human to manually solve them?

(In contrast to problems actually from "a huge database of IMO-type problems", they do have answers for these already).

summerlight1y ago

> A small detail wasn't clear to me: for these incorrectly formalized problems, how do they get the correct answer as ground truth for training? Have a human to manually solve them?

thrdbndndn1y ago

Ah, thanks. That makes a lot of sense now.

1 more reply

adrianN1y ago

You write proofs in a formal language that can be machine checked. If the checker is happy, the proof is correct (unless there is a bug in the checker, but that is unlikely).

raincole1y ago

They said the incorrectly formalized ones are usually easier, so I assume they just hire humans to solve them in the old way until the AI is smart enough to solve these easier problems.

czl1y ago

> I assume the just hire humans to solve…

llwu1y ago

> We don't have a way to verify formalizations the same way.

While there is no perfect method, it is possible to use the agent to determine if the statement is false, has contradictory hypotheses, or a suspiciously short proof.

j / k navigate · click thread line to collapse