The Secret Meeting Where Mathematicians Struggled to Outsmart AI (opens in new tab)

(scientificamerican.com)

47 pointsfmihaila9mo ago5 comments

5 comments

    > While sparring with o4-mini was thrilling, its progress was also alarming. Ono and He express concern that the o4-mini’s results might be trusted too much. “There’s proof by induction, proof by contradiction, and then proof by intimidation,” He says. “If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.”

I think there is a pitfall of designating a uniform categorization of “reasoning” like in this article; it is not surprising to hear that models are good at casting a wide net in fitting many different ideas together by association, however the subtle pitfalls in assuming that pieces fit together without unexpected interactions is something which require formal reasoning through instead of just correlating literature.

Reubend9mo ago

> “There’s proof by induction, proof by contradiction, and then proof by intimidation,” He says. “If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.”

Proof validation is the perfect solution to this, and indeed I would love to see future improvements to LLMs which allow them to formalize their proofs with a feedback loop from something like Lean or Coq so that they can ensure that hallucinations haven't occurred.

alimw9mo ago

You can already try this in Cursor. It doesn't work too well right now but perhaps that's just because noone has tuned the loop.

AlexErrant9mo ago

> “I came up with a problem which experts in my field would recognize as an open question in number theory—a good Ph.D.-level problem,” he says. He asked o4-mini to solve the question... o4-mini presented a correct but sassy solution

I wonder who gets first author credits on that paper.

ColinWright9mo ago

It would be nice to know what the problem actually was.

j / k navigate · click thread line to collapse