undefined | Better HN

0 pointslozenge1y ago0 comments

The AI took a day on one of the problems so it must have generated and discarded a lot of proofs that didn't work. How could it choose which one to submit as the answer, except the objective fact of the proof passing in Lean.

0 comments

8 comments · 3 top-level

snewman1y ago· 4 in thread

When tackling IMO problems, the hard part is coming up with a good approach to the proof. Verifying your proof (and rejecting your false attempts) is much easier. You'll know which one to submit.

(Source: I am a two-time IMO silver medalist.)

lupire1y ago

You are a human, not an AI. You know whether your idea seems related to the solution. The AI has thousands of ideas and doesn't know which are better. Graders shouldn't accept a thousand guesses grubbing for 1 point.

If verifying a good idea is easy, then the evidence shows that the AI didn't have good ideas for the other 2 problems.

badrunaway1y ago

we are talking about lean proofs. Given a formal statement and a proof - the lean can verify whether it's correct or not. It's like generating computer programs to solve a problem - the problem lied in generating useful solutions/sub-solutions so that the search is effective. They achieve this via using gemini as a lean proof generator aka. using a world model LLM fine tuned to generate lean proofs in a more effective manner.

Humans are even better at this as you mention - but effectively the approach is similar. Come up with lot of ideas and see what proves it.

saagarjha1y ago

I don't see why we should take your word for it, as opposed to just asking AlphaProof to comment instead.

topato1y ago

Well, he does have twice the amount of silver medals... And can speak the English language... Although, an AI attempting to speak with the human race entirely through an esoteric math-proofing language would be an interesting take on the whole, "humans make ET contact, interact through universal language.... pause for half of the movie, military hothead wanting to blow it out of the sky, until pretty lady mathematician runs into the oval office waving a sheet of paper... OF MATH!" trope.... But now, it's a race between you and I, to see who can write the screenplay first!

Agingcoder1y ago· 1 in thread

To some extent, what they do is stronger that the other contestants, who I understand don’t formally prove their answers.

necovek1y ago

In maths, you are only solving a problem by building a formal solution that is the proof.

Eg. if it asked you to do something computationally hard (when done in a brute force way: calculators not allowed), and you put out the answer without the process using formal math knowledge, you'd get zero points. Even if it was easy to prove the answer was correct.

That's why IMO and many math exams won't take a single answer even if correct.

Davidzheng1y ago

I think it proves lemmas it can submit all the lemmas it proved lol

j / k navigate · click thread line to collapse