AI solves International Math Olympiad problems at silver medal level (opens in new tab)

(deepmind.google)

1370 pointsocfnash1y ago525 comments

525 comments

259 comments · 84 top-level

adverbly1y ago· 24 in thread

> First, the problems were manually translated into formal mathematical language for our systems to understand. In the official competition, students submit answers in two sessions of 4.5 hours each. Our systems solved one problem within minutes and took up to three days to solve the others.

Three days is interesting... Not technically silver medal performance I guess, but let's be real I'd be okay waiting a month for the cure to cancer.

ZenMikey1y ago

I haven't read TFA as I'm at work, but I would be very interested to know what the system was doing in those three days. Were there failed branches it explored? Was it just fumbling its way around until it guessed correctly? What did the feedback loop look like?

qsort1y ago

I can't find a link to an actual paper, that just seems to be a blog post. But from what I gather the problems were manually translated to Lean 4, and then the program is doing some kind of tree search. I'm assuming they are leveraging the proof checker to provide feedback to the model.

tsoj1y ago

This is NOT the paper, but probably a very similar solution: https://arxiv.org/abs/2009.03393

visarga1y ago

> just fumbling its way around until it guessed correctly

As opposed to 0.999999% of the human population who can't do it even if their life depends on it?

2 more replies

thomasahle1y ago

They just write "it's like alpha zero". So presumably they used a version of MCTS where each terminal node is scored by LEAN as either correct or incorrect.

Then they can train a network to evaluate intermediate positions (score network) and one to suggest things to try next (policy network).

lacker1y ago

The training loop was also applied during the contest, reinforcing proofs of self-generated variations of the contest problems until a full solution could be found.

So they had three days to keep training the model, on synthetic variations of each IMO problem.

utopcell1y ago

I'm at work and reading this article is the first thing I did this morning. What's your point ?

101001101y ago

Don't confuse interpolation with extrapolation. Curing cancer will require new ideas. IMO requires skill proficiency in tasks where the methods of solving are known.

trotro1y ago

The methods are know, but the solutions to the IMO problems weren't. So the AI did extrapolate a solution.

Also, there's no reason to affirm that an eventual cure for cancer requires fundamentally new methods. Maybe the current methods are sufficient, it's just that nobody has been "smart" enough to put the pieces together. (disclaimer: not an expert at all)

2 more replies

im3w1l1y ago

I think this is kinda false actually on the cancer side. We have reached a point where we have known approaches that work. It's "just" a matter of putting them into practice which will of course require solving many little details, which is very important and time-consuming work, but it doesn't require super-human genius level of lateral thinking, just a few millions man years of grinding away at it.

golol1y ago

Mathematicians spend most of their time interpolating between known ideas and it would be extremely helpful to have computer assistance with that.

xdavidliu1y ago

new doesn't necessarily mean "an extremal point that's not the average of two existing points". The set of existing knowledge is not necessarily continuous; the midpoint between two known points may be unknown, and thus would be a "new" point that could be obtained by interpolation.

visarga1y ago

Search is extrapolation. Learning is interpolation. Search+Learn is the formula used by AZ. Don't forget AZ taught us humans a thing or two about a game we had 2000 years head start in, and starting from scratch not from human supervision.

1 more reply

trueismywork1y ago

They are the same things

wongarsu1y ago

The problem solved "within minutes" is also interesting. I'd interpret that as somewhere between 2 and 59 minutes. Given the vagueness probably on the higher end, otherwise they'd celebrate it more. The students had 6 tasks in 9 hours, so on average 1.5h per task. If you add the time a student would take to (correctly!) translate the problems to their input format, their best-case runtime is probably about as fast as a silver-medalist would take to solve the problem on their own.

But even if they aren't as fast as humans yet this is very valuable. Both as a stepping stone, and because at a certain scale compute is much easier to scale than skilled mathematicians.

gjm111y ago

They say "our systems" (presumably meaning AlphaProof and AlphaGeometry 2) solved one problem "within minutes", and later on the page they say that the geometry question (#4) was solved by AlphaGeometry in 19 seconds.

So either (1) "within minutes" was underselling the abilities of the system, or (2) what they actually meant was that the geometry problem was solved in 19 seconds, one of the others "within minutes" (I'd guess #1 which is definitely easier than the other two they solved), and the others in unspecified times of which the longer was ~3 days.

I'd guess it's the first of those.

(Euclidean geometry has been a kinda-solved domain for some time; it's not super-surprising that they were able to solve that problem quickly.)

As for the long solve times, I would guess they're related to this fascinating remark:

> The training loop was also applied during the contest, reinforcing proofs of self-generated variations of the contest problems until a full solution could be found.

1 more reply

nnarek1y ago

"three days" does not say anything about how much computational power is used to solve problems, maybe they have used 10% of all GCP :)

falcor841y ago

And say they did use 10% of all GCP? Would it be less impressive? This is a result that was considered by experts to be far beyond the state of the art; it's absolutely ok if it's not very efficient yet.

Also, for what it's worth, I'm pretty sure that I wouldn't have been able to solve it myself in three days, even if I had access to all of GCP, Azure and AWS (except if I could mine crypto to then pay actual IMO-level mathematicians to solve it for me).

2 more replies

vlovich1231y ago

The thing is though, once we have a benchmark that we pass, it’s pretty typical to be able to bring down time required in short order through performance improvements and iterating on ideas. So if you knew you had GAI but it took 100% of all GCP for 3 years to give a result, within the next 5 years that would come down significantly (not least of which you’d build HW dedicated to accelerating the slow parts).

1 more reply

ComplexSystems1y ago

Or the simultaneous discovery of thousands of cryptographic exploits...

poincaredisk1y ago

Still waiting for the first one. I'm not holding my breath - just like fuzzing found a lot of vulnerabilities in low-level software, I expect novel automated analysis approaches will yield some vulnerabilities - but that won't be a catastrophic event just like fuzzing wasn't.

4 more replies

lolinder1y ago

It feels pretty disingenuous to claim silver-medal status when your machine played by significantly different rules. The article is light on details, but it says they wired it up to a theorem prover, presumably with feedback sent back to the AI model for re-evaluation.

How many cycles of guess-and-check did it take over the course of three days to get the right answer?

If the IMO contestants were allowed to use theorem provers and were given 3 days (even factoring in sleep) would AlphaProof still have gotten silver?

> let's be real I'd be okay waiting a month for the cure to cancer.

I don't think these results suggest that we're on the brink of knowledge coming at a substantially faster rate than before. Humans have been using theorem provers to advance our understanding for decades. Now an LLM has been wired up to one too, but it still took 8x as long to solve the problems as our best humans did without any computer assistance.

golol1y ago

I believe you are misreading this.

First of all, this is not a sport and the point is not to compare AI to humans. The point is to compare AI to IMO-difficulty problems.

Secondly, this is now some hacky trick where Brute force and some theorem prover magic are massaged to solve a select few problems and then you'll never hear about it again. They are building a general pipeline which turns informal natural lamguage mathematics (of which we have ungodly amounts available) into formalized mathematics, and in addition trains a model to prove such kinds of mathematics. This can also work for theory building. This can become a real mathematical assistant that can help a mathematician test an argument, play with variations of a definition, try 100 combinations of some estimates, apply a classic but lengthy technique etc. etc.

2 more replies

regularfry1y ago

I'm not sure it matters that it had access to a theorem prover. The fact that it's possible to build a black box that solves hard proofs on its own at all is the fascinating bit.

> it still took 8x as long to solve the problems as our best humans did without any computer assistance.

Give it a year and that ratio will be reversed. At least. But also it matters less how long it takes if doubling the number of things reasoning at a best-human level is pronounced "ctrl-c, ctrl-v".

Smaug1231y ago· 19 in thread

So I am extremely hyped about this, but it's not clear to me how much heavy lifting this sentence is doing:

> First, the problems were manually translated into formal mathematical language for our systems to understand.

The non-geometry problems which were solved were all of the form "Determine all X such that…", and the resulting theorem statements are all of the form "We show that the set of all X is {foo}". The downloadable solutions from https://storage.googleapis.com/deepmind-media/DeepMind.com/B... don't make it clear whether the set {foo} was decided by a human during this translation step, or whether the computer found it. I want to believe that the computer found it, but I can't find anything to confirm. Anyone know?

ocfnashOP1y ago

The computer did find the answers itself. I.e., it found "even integers" for P1, "{1,1}" for P2, and "2" for P6. It then also provided provided a Lean proof in each case.

freehorse1y ago

It would make a lot of sense for the lean-code-formalisation of the problems done by the researchers fed to the AI to be provided. Not assuming bad intent in not providing them, but it would help understand better the results.

nnarek1y ago

formal definition of first theorem already contain answer of the problem "{α : ℝ | ∃ k : ℤ, Even k ∧ α = k}" (which mean set of even real numbers).if they say that they have translated first problem into formal definition then it is very interesting how they initially formalized problem without including answer in it

6 more replies

Davidzheng1y ago

Can you elaborate on how it makes guesses like this? Does it do experiments before? Is it raw LLM? Is it feedback loop based on partial progress?

2 more replies

summerlight1y ago

To speak generally, that translation part is much easier than the proof part. The problem with automated translation is that the translation result might be incorrect. This happens a lot when even people try formal methods by their hands, so I guess the researchers concluded that they'll have to audit every single translation regardless of using LLM or whatever tools.

thomasahle1y ago

You'd think that, but Timothy Gowers (the famous mathematician they worked with) wrote (https://x.com/wtgowers/status/1816509817382735986)

> However, LLMs are not able to autoformalize reliably, so they got them to autoformalize each problem many times. Some of the formalizations were correct, but even the incorrect ones were useful as training data, as often they were easier problems.

So didn't actually solve autoformalization, which is why they still needed humans to translate the input IMO 2024 problems.

The reason why formalization is harder than you think is that there is no way to know if you got it right. You can use Reinforcement Learning with proofs and have a clear signal from the proof checker. We don't have a way to verify formalizations the same way.

2 more replies

ajross1y ago

> To speak generally, that translation part is much easier than the proof part.

To you or me, sure. But I think the proof that it isn't for this AI system is that they didn't do it. Asking a modern LLM to "translate" something is a pretty solved problem, after all. That argues strongly that what was happening here is not a "translation" but something else, like a semantic distillation.

If you ask a AI (or person) to prove the halting problem, they can't. If you "translate" the question into a specific example that does halt, they can run it and find out.

I'm suspicious, basically.

dooglius1y ago

The linked page says

> While the problem statements were formalized into Lean by hand, the answers within the problem statements were generated and formalized by the agent.

However, it's unclear what initial format was given to the agents that allowed this step

Smaug1231y ago

FWIW, GPT-4o transcribed a screenshot of problem 1 perfectly into LaTeX, so I don't think "munge the problem into machine-readable form" is per se a difficult part of it these days even if they did somehow take shortcuts (which it sounds like they didn't).

1 more reply

pclmulqdq1y ago

So if Lean was used to find the answers, where exactly is the AI? A thin wrapper around Lean?

4 more replies

zerocrates1y ago

Interesting that they have a formalizer (used to create the training data) but didn't use it here. Not reliable enough?

golol1y ago

> When presented with a problem, AlphaProof generates solution candidates and then proves or disproves them by searching over possible proof steps in Lean.

To me, this sounds like Alphaproof receives a "problem", whatever that is (how do you formalize "determine all X such that..."? One is asked to show that an abstract set is actually some easily understandable set...). Then it generates candidate Theorems, persumably in Lean. I.e. the set is {n: P(n)} for some formula or something. Then it searches for proofs.

I think if Alphaproof did not find {foo} but it was given then it would be very outrageous to claim that it solved the problem.

I am also very hyped.

sebzim45001y ago

I as someone with a maths degree but who hasn't done this kind of thing for half a decade, was able to immediately guess the solution to (1) but actually proving it is much harder.

gowld1y ago

The article says

> AlphaProof solved two algebra problems and one number theory problem by determining the answer and proving it was correct.

rldjbpin1y ago

as a noob, i feel that formalizing is a major part of solving the problem by yourserlf. my assessment is that once you identify certain patterns, you can solve problems by memorizing some patterns. but people might me can struggle with the first stage and solve the wrong problem.

still good progress nonetheless. won't call the system sufficient by itself tho.

SonOfLilit1y ago

My mathematician friend said problem 5 (I think? With the monsters) seems hard to formulate, so I spent 15 minutes formulating it in pseudo-haskell.

Then he gave me a huge hint to the solution, after which it only took me a couple of hours to solve.

(Formalizing the solution is of course the hardest part, and might serve as a good masters dissertation I think)

1 more reply

kurthr1y ago

As is often the case, creating a well formed problem statement often takes as much knowledge (if not work) as finding the solution.

But seriously, if you can't ask the LLM to solve the right question, you can't really expect it to give you the right answer unless you're really lucky. "I'm sorry, but I think you meant to ask a different question. You might want to check the homework set again to be sure, but here's what I think you really want."

hyfgfh1y ago

> First, the problems were manually translated into formal mathematical language for our systems to understand.

Some people call this programming

allxnb1y ago

Presenting this just as "translating into formal language" omits important information.

Lean isn't just a formal language, it is also a theorem prover, Could the IMO participants use the nlinarith tactic? Could they use other tactics?

Of course not, they had to show their work!

Could they have mathematicians translate the problem statements into the formal language for them?

Of course not, they had to do it themselves. In "How to solve it" Polya stresses multiple times that formalizing the initial question is an important part of the process.

Then, the actual computational resources expressed in time are meaningless if one has a massive compute cloud.

I'm a bit dissatisfied with the presentation, same as with the AlphaZero comparison to an obsolete Stockfish version that has been debunked multiple times.

necovek1y ago· 17 in thread

This is certainly impressive, but whenever IMO is brought up, a caveat should be put out: medals are awarded to 50% of the participants (high school students), with 1:2:3 ratio between gold, silver and bronze. That puts all gold and silver medalists among the top 25% of the participants.

That means that "AI solves IMO problems better than 75% of the students", which is probably even more impressive.

But, "minutes for one problem and up to 3 days for each remaining problem" means that this is unfortunately not a true representation either. If these students were given up to 15 days (5 problems at "up to 3 days each") instead of 9h, there would probably be more of them that match or beat this score too.

It really sounds like AI solved only a single problem in the 9h students get, so it certainly would not be even close to the medals. What's the need to taint the impressive result with apples-to-oranges comparison?

Why not be more objective and report that it took longer but was able to solve X% of problems (or scored X out of N points)?

SonOfLilit1y ago

I have met IMO competitors. They are insanely smart. I wouldn't imagine it's possible to be that smart before I started hanging in these circles. So more like 25% of 0.01% of high school students.

Time is not a very interesting dimension here, because humans don't use the same CPU as huge GPU clusters. The binary "is it able to reach a solution given enough resources?" is more interesting (for GPT/Claude the answer is a clear negative).

lupire1y ago

But many more of those students (including many not in the contest) could solve those problems given more time than the 4.5hr sessions.

If these problems were important to solve, redundantly by thousands or millions of people (like the real work that most people do), far more people would put in the effort to learn how to solve these problems.

It's just a weird comparison. Contests are very artificially structured in ways that don't make sense for comparing to computers.

1 more reply

necovek1y ago

You probably misinterpreted the above: 25% of the participants in IMO, who are generally high school students. Never claimed that 25% of all high schoolers would do better.

Now, you may say how time is not a useful dimension here, but really, this is where we are seeing a lot of these advances come from: general researchers today do get access to huge compute capability, allowing them to quickly iterate on different ideas. In a sense, they can be less smart about their use of resources and simply try things out: this does drive the innovation (compared to waiting for their turn on a supercomputer nearby).

And finally, time is essential even for humans: given a couple hundred years, they will find the proof for Fermat's last theorem, but they might not do it in 4.5h. Since we are comparing AI capabilities to humans in the article, it's very possible that increased compute will never allow AI to find novel proofs we have not come up with either. That's where the AI bit comes in: we know that brute searching through the entire possible space of proofs is still too expensive for our compute capabilities, so we need AI to emulate our "intuition" when looking for the direction to narrow down the search.

So there are really two reasons time matters: 1. getting enough of compute might still be far away (heck, prime factorization and elliptic curves are still the basis of the most of world cryptography for that reason) and 2. maybe it's not even enough to increase compute capability to make huge jumps in problem solving capabilities (iow, maybe we are reaching maximum of where the approach can take us).

astromaniak1y ago

I've seen chess players, they are smart too. So what? Specialized model on commodity hardware beats most of them, if not all. This doesn't mean model is smarter. The same could be here. Is it possible that solutions are one step from model's database, or training set. Humans don't remember this much and have to do many more actions in their heads.

In other words model which can solve any/most school / college exam problem isn't necessarily smart. It can be just a database, or, in fact, a lookup table. Smarter version can be a lookup + 1 step test. Not saying it's bad, but it doesn't scale to less formalized domains. BTW, in this case formalization was done by humans.

NiloCK1y ago

> medals are awarded to 50% of the participants (high school students)

In case this confuses anyone: the high school students in question are not a standard sample of high school students. AFAIK, they are teams of the ~6 strongest competitive problem solving high school students from each country.

Davidzheng1y ago

In my opinion (not Google s) the only reason they didn't get gold this year (apart from being unlucky on problem selection) is that they didn't want to try for any partial credit in P3 and P5. They are so close to the cut off and usually contestants with a little bit of progress can get 1 point. But i guess they didn't want to get a gold on a technicality--it would be bad press. So they settled in a indisputable silver

lozenge1y ago

The AI took a day on one of the problems so it must have generated and discarded a lot of proofs that didn't work. How could it choose which one to submit as the answer, except the objective fact of the proof passing in Lean.

3 more replies

llwu1y ago

Partial credit is quite challenging to earn, per: https://www.imo-official.org/year_statistics.aspx?year=2024

1 more reply

dooglius1y ago

I don't believe anything was graded by the IMO, Google is just giving itself 7 for anything proved in Lean (which is reasonable IMO), so they can't really try for partial credit so much as choose not to report a higher self-graded number.

1 more reply

tardygrade1y ago

One key difference between giving humans more time and giving computer programs more time is that historically we have had more success making the latter run faster than the former.

muglug1y ago

> What's the need to taint the impressive result with apples-to-oranges comparison?

Most of DeepMind’s research is a cost-centre for the company. These press releases help justify the continued investment both to investors and to the wider public.

utopcell1y ago

> Most of DeepMind’s research is a cost-centre for the company.

The effect of establishing oneself as the thought leader in a field is enormous.

For example, IBM's stock went up 15% the month after they beat Kasparov.

mensetmanusman1y ago

Cost centers are profit centers when R&D is successful.

acchow1y ago

But computers get faster each year, so even with zero progress in actual AI, this will reach human-student speeds in a few years (need a 40x speed up)

awesomeMilou1y ago

Could you explain where the 40x speedup comes from, given that literally the biggest problem in semi conductors right now is smaller node size?

2 more replies

asah1y ago

The complexity of truth doesn't fit in a headline, let alone attract enough clicks to win the competition for eyeballs and virality - which means that even if someone somehow succeeded in telling the whole truth in a headline, that article would lose in the race for eyeballs to the clickbait-headline version, which means your eyeballs would (statistically) never see it, and only the clickbait version.

I'd say "welcome to the web" but this was true in 1800s newspapers as well.

xkcd19631y ago

Because that wouldn't be hype enough.

cynicalpeace1y ago· 16 in thread

Machines have been better than humans at chess for decades.

Yet no one cares. Everyone's busy watching Magnus Carlsen.

We are human. This means we care about what other humans do. We only care about machines insofar as it serves us.

This principle is broadly extensible to work and art. Humans will always have a place in these realms as long as humans are around.

ertgbnm1y ago

I'm sure humans will always enjoy chess and art regardless of how much better AI is at it. In the same way, there will probably always be mathematic hobbyist who study math for fun. But I seriously doubt that in the near future there will be mathematicians who will be publishing new advancements that aren't mostly or entirely discovered by AI. A human might get credit for a proof for asking the initial question, but there is pretty much no world where a computer can easily solve a meaningful mathematical problem but we insist on a human solve it more slowly and expensively instead.

cynicalpeace1y ago

My point was where does the concept of "meaningful" come from?

The proof will only have value if it's meaningful to us.

camjw1y ago

Sure but if an AI can prove e.g the Goldbach conjecture then that is a bfd.

hyperbovine1y ago

What if the proof were incomprehensible to humans?

4 more replies

cynicalpeace1y ago

How is that a counterpoint?

christianqchung1y ago

> This principle is broadly extensible to work and art

Nah, as a consumer it makes no difference to me if a meat packing factory or Amazon warehouse employs 5000 or 5 people. To art, this principle is totally real, but for work, it only applies to some/most of it.

cynicalpeace1y ago

Read the next sentence: "We only care about machines insofar as it serves us."

Imagine a machine doing "work" that only serves itself and other machines that does no service to humanity. It would have no economic value. In fact the whole concept of "work" only makes sense if it is assigned economic value by humans.

1 more reply

auggierose1y ago

There are people that believe that mathematics is actually useful, in ways that chess or art are not. I know, most mathematicians don't think so. But let us just entertain this crazy thought for a moment. Then a proof is just a tool that tells us, oh, we have applied this piece of mathematics right. No understanding of the proof is actually required for that, and no one cares if some mathematician somewhere actually fully understands the proof. It will be OK, even expected, that the machine is better than us at finding and checking proofs.

zone4111y ago

I don't think this principle extends to math proofs. It's much, much easier to verify a proof than to create it, and a second proof will just be a footnote. Not many mathematicians will want to work on that. That said, there is a lot of distance between IMO and the frontiers of research math.

cynicalpeace1y ago

Why does it not extend to math proofs?

smokel1y ago

> Everyone's busy watching Magnus Carlsen.

Actually, I was looking up Elo ratings of the top computer chess players, and learned that it is not that trivial to compare these, due to differences in hardware requirements and whatnot.

cynicalpeace1y ago

Are you arguing computer chess players are as popular as human chess players?

1 more reply

bongodongobob1y ago

Eh, people definitely care. AI has completely changed chess. Non viable lines have been proven viable and vice versa. All the pros study and develop new lines with AI.

awahab921y ago

magnus carlsen basically quit because computers ruined chess. As did kasparov.

Fischer was probably the last great player who was unassisted by tools.

karmakurtisaani1y ago

Carlsen plays still at the highest level. He just didn't want to do the world championship anymore, wasn't worth the effort after winning it so many times.

hyperbovine1y ago

?? Carlsen is very much active -- look up the YouTube channel EpicChess for an extremely entertaining recap of what he's up to recently.

golol1y ago· 14 in thread

This is the real deal. AlphaGeometry solved a very limited set of problems with a lot of brute force search. This is a much broader method that I believe will have a great impact on the way we do mathematics. They are really implementing a self-feeding pipeling from natural language mathematics to formalized mathematics where they can train both formalization and proving. In principle this pipeline can also learn basic theory building like creating auxilliary definitions and Lemmas. I really think this is the holy grail of proof-assistance and will allow us to formalize most mathematics that we create very naturally. Humans will work podt-rigorously and let the machine asisst with filling in the details.

fmap1y ago

Agreed, this is a big step forward. Geometry problems are in a different class, since you can translate them into systems of polynomial equations and use well known computer algebra algorithms to solve them.

By contrast, this kind of open ended formalization is something where progress used to be extremely slow and incremental. I worked in an adjacent field 5 years ago and I cannot stress enough that these results are simply out of reach for traditional automated reasoning techniques.

Real automatic theorem proving is also useful for a lot more than pure mathematics. For example, it's simple to write out an axiomatic semantics for a small programming language in lean and pose a question of the form "show that there exists a program which satisfies this specification".

If this approach scales it'll be far more important than any other ML application that has come out in the last few years.

cubefox1y ago

> Agreed, this is a big step forward. Geometry problems are in a different class, since you can translate them into systems of polynomial equations and use well known computer algebra algorithms to solve them.

The blog post indicates the opposite. The geometry problem in the IMO problem set was solved by AlphaGeometry 2, which is an LLM based on Google's Gemini. LLMs are considered relatively general systems. But the other three solved problems were proved by AlphaProof, which is a narrow RL system that is literally based on AlphaZero, the Go and Chess AI. Only its initial (bootstrapping) human training data (proofs) were formalized and augmented by an LLM (Gemini).

2 more replies

visarga1y ago

> a lot of brute force search

Don't dismiss search, it might be brute force but it goes beyond human level in Go and silver at IMO. Search is also what powers evolution which created us, also by a lot of brute forcing, and is at the core of scientific method (re)search.

gowld1y ago

What makes solving IMO problems hard is usually the limits of human memory, pattern-matching, and search, not creativity. After all, these are problems that are already solved, and it is expected that many people can solve the problems in about 1 hour's time.

That makes it, in principle, similar or even easier than a champsionship-level chess move, which often take more than 1 hour for a professional human (with more training than an IMO high school student) to solve.

Another interesting concern is that when posing a problem to humans, it's fine to pose an "easy" brute-forceable problem, but humans, being slow brute-searchers, need to find more clever solutions. But if you give such a problem to a computer, it can trivialize it. So to test a computer, you need to pose non- easily-brute-forceable problems, which are harder for the computer than the others, but equally difficult for the humans as the other problems are.

2 more replies

thomasahle1y ago

Also AlphaProof had to search for 60 hours for one of the IMO problems it solved.

2 more replies

Eridrus1y ago

Search is great, search works, but there was not a tonne to learn from the AlphaGeometry paper unless you were specifically interested in solving geometry problems.

kypro1y ago

My old AI professor used to say that every problem is a search problem.

The issue is that to find solutions for useful problems you're often searching through highly complex and often infinite solution spaces.

3 more replies

Davidzheng1y ago

Yes and there's a lot of search here too. That's a key to the approach

sideeffffect1y ago

People are already working on this

https://leandojo.org/

https://machine-learning-for-theorem-proving.github.io/

https://www.youtube.com/watch?v=P5ew0BrRm_I

https://paperswithcode.com/task/automated-theorem-proving

https://old.reddit.com/r/math/comments/11mb9lx/future_of_aut...

https://github.com/RiccardoBiosas/LeanGPT

geysersam1y ago

I imagine a system like this to be vastly more useful outside the realm of mathematics research.

You don't need to be able to prove very hard problems to do useful work. Proving just simple things is often enough. If I ask a language model to complete a task, organize some entries in a certain way, or schedule this or that, write a code that accomplishes X, the result is typically not trustworthy directly. But if the system is able to translate parts of the problem to logic and find a solution, that might make the system much more reliable.

creata1y ago

But for it to be 100% trustworthy, you'd have to express correctness criteria for those simple tasks as formal statements.

3 more replies

EugeneOZ1y ago

No. It's like you are allowed to use search engines to find a solution, nothing more than that.

humansareok11y ago

Through a search space often large enough to be completely intractable with a galaxy wide computer.

benreesman1y ago

As resident strident AI skeptic, yeah, this is real.

But MCTS was always promising when married to large NNs and DeepMind/Brain were always in front on it.

I don’t know who fucked up on Gemini and it’s concerning for Alphabet shareholders that no one’s head is on a spike. In this context “too big to fail” is probably Pichai.

But only very foolish people think that Google is lying down on this. It’s Dean and Hassabis. People should have some respect.

Ericson23141y ago· 10 in thread

The lede is a bit buried: they're using Lean!

This is important for more than Math problems. Making ML models wrestle with proof systems is a good way to avoid bullshit in general.

Hopefully more humans write types in Lean and similar systems as a much way of writing prompts.

Smaug1231y ago

And while AlphaProof is clearly extremely impressive, it does give the computer an advantage that a human doesn't have in the IMO: nobody's going to be constructing Gröbner bases in their head, but `polyrith` is just eight characters away. I saw AlphaProof used `nlinarith`.

empath751y ago

Don't think of lean as a tool that the ai is using (ie, cheating), think of lean plus AlphaProof as a single system. There's no reason to draw artificial boundaries around where the AI is and where the tools that the AI is using are. Lean itself is a traditional symbolic artificial intelligence system.

People want always knock generative AIs for not being able to reason, and we've had automated systems that reason perfectly well for decades, but for some reason that doesn't count as AI to people.

Davidzheng1y ago

Good. I want my AI to use all the advantages it has to reinvent the landscape of mathematics

2 more replies

Ericson23141y ago

Hehe, well, we'll need to have a tool-assited international math Olympiad then.

2 more replies

sebzim45001y ago

The uses of `nlinarith` are very straight forward manipulations of inequalities, they would be one or two steps for a human too.

xrisk1y ago

Can you give some context on how using Lean benefits?

In my understanding, proofs are usually harder to transcribe into Lean which is nobody _writes_ proofs using Lean.

What is a nlinarith?

1 more reply

gowld1y ago

That's not necessarily an edge. https://www.ma.imperial.ac.uk/~buzzard/xena/formalising-math...

has the example:

  0 ≤ x^2 if x : ℝ

which humans simply use without proof. The IMO doesn't challenge participants to prove everything, only the main ideas.

queuebert1y ago

That's amazing. I was just about to comment that hooking this up to Lean [1] would be killer. This must be the way forward for higher math, as proofs are getting so complicated that almost no one understands all pieces of major proofs.

1. https://lean-lang.org/

Ericson23141y ago

They're def gonna go after the Riemann hypothesis with this, hehe.

nwoli1y ago

Guessing the context here is that the RH was recently translated into Lean. Would be very cool if they threw their compute on that

1 more reply

cs7021y ago· 9 in thread

In 1997, machines defeated a World Chess Champion for the first time, using brute-force "dumb search." Critics noted that while "dumb search" worked for chess, it might not necessarily be a general strategy applicable to other cognitive tasks.[a]

In 2016, machines defeated a World Go Champion for the first time, using a clever form of "dumb search" that leverages compute, DNNs, reinforcement learning (RL), and self-play. Critics noted that while this fancy form of "dumb search" worked for Go, it might not necessarily be a general strategy applicable to other cognitive tasks.[a]

In 2024, machines solved insanely hard math problems at the Silver Medal level in an International Math Olympiad for the first time, using a clever form of "dumb search" that leverages compute, DNNs, RL, and a formal language. Perhaps "dumb search" over cleverly pruned spaces isn't as dumb as the critics would like it to be?

---

[a] http://www.incompleteideas.net/IncIdeas/BitterLesson.html

tux31y ago

The success in Go was very much not dumb search. Where dumb search had failed to achieve the level of even a very weak player, the neural net's "intuition" about good moves without any search was already very strong. Only the combination was superhuman, and that was anything but the dumb search that had been tried before.

Today's announcement is also not about proving Lean theorems by "dumb search". The success is about search + neural networks.

You're attacking critics for criticizing the solution that has failed, while confusing it for the solution that works to this day.

2 more replies

klyrs1y ago

A brute force search has perfect knowledge. Calling it "dumb" encourages bad analogies -- it's "dumb" because it doesn't require advanced reasoning. It's also "genius" because it always gets the right answer eventually. It's hugely expensive to run.

And you keep shifting the goalpost on what's called "dumb" here.

2 more replies

Davidzheng1y ago

By its nature hard math problems do not have fast algorithmic solutions--you can only solve them by search or clever search. Mathematicians have heuristics on helping us decide intuitively what's going to work what can make progress. But in the end--there is only search

mikeknoop1y ago

High efficiency "search" is necessary to reach AGI. For example, humans don't search millions of potentially answers to beat ARC Prize puzzles. Instead, humans use our core experience to shrink the search space "intuitively" and deterministically check only a handful of ideas. I think deep-learning guided search is an incredibly promising research direction.

fspeech1y ago

By "dumb" I assume you meant brute-force. Search, as opposed to extrapolation, is what actually produces suprise or creative results, whether it happens in a person's head or on a computer. The issue is to produce the heuristics that can let one push back the combinatorial explosion.

netcan1y ago

Well written.

It kinda had to be this way, I think. There's a something from nothing problem. Douglas Adams brilliantly starts at this point.

We don't understand something from nothing. We don't even have the language to describe it. Concepts like "complexity" are frustratingly resistant to formalization.

"There is no free will." Has recently resurged as a philosophical position... like it did in response to Newton's mechanics.

Matter from void. Life from minerals. Consciousness from electrons. Free will from deterministic components. Smart reasoning & intelligent rationalisation from dumb search, computation, DNNs and such.

I don't think this debate was supposed to be ended by anything short of empirical demonstration.

Endnote: deep blue's victory over Gary was a bunch of preprogrammed bulls--t. Rematch!

creer1y ago

If one of the lessons of LLMs was that much of "common sense" is covered in the written language corpus - that is, perhaps, basic human intelligence is covered by language.

Then, should we expect less with mathematics where written language is the normal way knowledge is propagated, and where formal proofs are wanted? An important distinction here is the coupling of search (not LLM for this one), a formal math language, and theorem proving. Math intelligence may not be merely the math written corpus, but adding the formal language and theorem proving sounds pretty powerful.

All this still lacks self-directed goals. An intention. For now that's taken care of by the human asking questions.

1 more reply

sobellian1y ago

From whence are you quoting the phrase "dumb search?"

1 more reply

TheDudeMan1y ago

So all forms of search are dumb in your view?

1 more reply

fancyfredbot1y ago· 8 in thread

I'm seriously jealous of the people getting paid to work on this. Sounds great fun and must be incredibly satisfying to move the state of the art forward like that.

GuB-421y ago

I don't know about that. A lot of the work that should have been very satisfying turned out to be boring as hell, if not toxic, while at the same time, some apparently mundane stuff turned out to be really exciting.

I found the work environment to be more important than the subject when it comes to work satisfaction. If you are working on a world changing subject with a team of assholes, you are going to have a bad time, some people really have a skill for sucking the fun out of everything, and office politics are everywhere, especially on world changing subjects.

On the other hand, you can have a most boring subject, say pushing customer data to a database, and have the time of your life: friendly team, well designed architecture, time for experimentation and sharing of knowledge, etc... I have come to appreciate the beauty of a simple thing that just works. It is so rare, maybe even more rare than scientific breakthroughs.

Now, you can also have an awesome work environment and an awesome subject, it is like hitting the jackpot... and a good reason to be envious.

phillypham1y ago

Awesome work environment for one person can be not ideal for another.

Pretty much all the top AI labs are both intensely competitive and collaborative. They consist of many former IMO and IOI medalists. They don't believe in remote work, either. Even if you work at Google DeepMind, you really need to be in London for this project.

1 more reply

lonesword1y ago

I work in this space (pretraining LLMs). It looks fancier than it really is. It does involve wrangling huge ymls and writing regular expressions at scale (ok I am oversimplifying a bit). I should be excited (and grateful) that I get to work on these things but shoddy tooling takes the joy out of work.

onemoresoop1y ago

You probably mean envious not jealous.

yalok1y ago

I'm learning something new today. In some other languages these 2 are usually the same 1 word.

4 more replies

Mithriil1y ago

Best we can do then is keep ourselves up to date and give our support!

bearjaws1y ago

C'mon you're meant to be re-configuring 3,292,329 line of YML for K8s.

(/s)

psbp1y ago

It's funny that if I could describe my entire career, it would probably be something similar to software janitor/maintenance worker.

I guess I should have pursued a PhD when I was younger.

1 more reply

thrance1y ago· 6 in thread

Theorem proving is a single-player game with an insanely big search space, I always thouht it would be solved long before AGI.

IMHO, the largest contributors to AlphaProof were the people behind Lean and Mathlib, who took the daunting task of formalizing the entirety of mathematics to themselves.

This lack of formalizing in math papers was what killed any attempt at automation, because AI researcher had to wrestle with the human element of figuring out the author's own notations, implicit knowledge, skipped proof steps...

camjw1y ago

> Theorem proving is a single-player game with an insanely big search space, I always thouht it would be solved long before AGI.

This seems so weird to me - AGI is undefined as a term imo but why would you expect "producing something generally intelligent" (i.e. median human level intelligence) to be significantly harder than "this thing is better than Terrence Tao at maths"?

thrance1y ago

My intuition tells me we humans are generally very bad at math. Proving a theorem, in an ideal way, mostly involves going from point A to point B in the space of all proofs, using previous results as stepping stones. This isn't particularly a "hard" problem for computers which are able to navigate search spaces for various games much more efficiently than us (chess, go...).

On the other hand, navigating the real world mostly consists in employing a ton of heuristics we are still kind of clueless about.

At the end of the day, we won't know before we get there, but I think my reasons are compelling enough to think what I think.

2 more replies

pedrosorio1y ago

Replace "this thing is better than Terrence Tao at maths" with "this thing is better than Garry Kasparov at chess" and your statement would sound equally reasonable in the early 90s.

HarHarVeryFunny1y ago

Because being good/competent at EVERYTHING is harder than being great at any one thing for a computer, since generality requires fluid intelligence but a single specialist skill can be achieved through brute force search, as we've seen over and over again from DeepBlue 30 years ago to the present time.

In the meantime, while DeepBlue beat the world chess champion, Kasparov, at chess, our best efforts at generalism - LLMs than many (not me!) think are the path to AGI - struggle to play tic tac toe.

raincole1y ago

Because "Generally Intelligent" is a very broad and vague term.

"Better than Terrence Tao at solving certain formalized problems" (not necessarily equal to "Better that Terrence Tao at maths) isn't.

2 more replies

Davidzheng1y ago

They didn't formalize the entirety of math. Good thing imo doesn't need the entirety. But they didn't even formalize enough for imo--this is probably why combo wasn't solved

piombisallow1y ago· 6 in thread

IMO problems aren't fundamentally different from chess or other games, in that the answer is already known.

energy1231y ago

IMO and Chess are the same in the most important respect, you can use Lean or a simulated chess game to create unlimited quality training labels. Any problem of this category should be solved with enough compute and clever architecture/metacognition design. The more intractable problems are where data is hard to find or synthesize.

Smaug1231y ago

I really don't understand what you mean by this. 1) it's not known whether chess is a win for White or not. 2) IMO problems, such as 2024 problem 1 which the system solved, are often phrased as "Determine all X such that…".

1 more reply

wufufufu1y ago

Kinda? Chess isn't solved. Complex problems can have better solutions discovered in the future.

1 more reply

osti1y ago

But the answer is probably not known by you, in particular.

1 more reply

almostgotcaught1y ago

> in that the answer is already known.

you realize this holds true for all of math right? outside of godel incompleteness potholes every proof/theorem is a permutation of ZFC. And you can fix the potholes by just filling them in with more Cs.

2 more replies

Davidzheng1y ago

lol

petters1y ago· 5 in thread

The problems were first converted into a formal language. So they were partly solved by the AI

golol1y ago

Formalization is in principle just a translation process and should be a much simpler problem than the actual IMO problem. Besides, they also trained a Gemini model which formalizes natural language problems, and this is how they generated training data for AlphaProof. I would therefore expect that they could have also formalized the IMO problems with that model and just did it manually because the point is not to demonstrate formalizing but instead proof capabilities.

3 more replies

jeremyjh1y ago

Yes and it is difficult for me to believe that there is not useful human analysis and understanding involved in this translation that the AI is helpless without. But that I suppose is a problem that could be tackled with a different model...

2 more replies

trotro1y ago

But formalization is the easy part for humans. I'm sure every mathematician would be be happy if the only thing required to prove a result was to formalize it in Lean and feed it to the AI to find the proof.

1 more reply

clbrmbr1y ago

IIUC, a Gemini-based system could translate the natural language questions into Lean, but in the blog post they don’t really commit to whether this was done just to generate training data or was used in the competition.

2 more replies

rpois1y ago

Does this formalization process include giving it the answer it should try to prove?

2 more replies

lumb631y ago· 5 in thread

Can someone explain why proving and math problem solving is not a far easier problem for computers? Why does it require any “artificial intelligence” at all?

For example, suppose a computer is asked to prove the sum of two even numbers is an even number. It could pull up its list of “things it knows about even numbers”, namely that an even number modulo 2 is 0. Assuming the first number is “a” and the second is “b”, then it knows a=2x and b=2y for some x and y. It then knows via the distributive property that the sum is 2(x+y), which satisfies the definition of an even number.

What am I missing that makes this problem so much harder than applying a finite and known set of axioms and manipulations?

psb2171y ago

In a sense, the model _is_ simply applying a finite and known set of axioms and manipulations. What makes this hard in practice is that the number of possible ways in which to perform multiple steps of this sort of axiomatic reasoning grows exponentially with the length of the shortest possible solution for a given problem. This is similar to the way in which the tree of possible futures in games like go/chess grows exponentially as one tries to plan further into the future.

This makes it natural address these problems using similar techniques, which is what this research team did. The "magic" in their solution is the use of neural nets to make good guesses about which branches of these massive search trees to explore, and make good guesses about how good any particular branch is even before they reach the end of the branch. These tricks let them (massively) reduce the effective branching factor and depth of the search trees required to produce solutions to math problems or win board games.

zone4111y ago

The problems in question require much, much more complex proofs. Try example IMO problems yourself and see if they don't require much intelligence: https://artofproblemsolving.com/wiki/index.php/IMO_Problems_.... And then keep in mind that research math is orders of magnitude more complex still.

ComplexSystems1y ago

What you're missing is that this kind of thing has arbitrarily been declared "artificial intelligence" territory. Once the ability of computers to do has been established, it will no longer be artificial intelligence territory; at that point it'll just be another algorithm.

booleandilemma1y ago

Proofs require a certain ingenuity that computers just don't have, imo. A computer would never be able to come up with something like Cantor's diagonalization proof on its own.

1 more reply

runeblaze1y ago

Another answer is that 3SAT and co can be seen as distilled variants of proving statements. Well, 3SAT is famously hard.

lolinder1y ago· 4 in thread

This is a fun result for AI, but a very disingenuous way to market it.

IMO contestants aren't allowed to bring in paper tables, much less a whole theorem prover. They're given two 4.5 hour sessions (9 hours total) to solve all the problems with nothing but pencils, rulers, and compasses [0].

This model, meanwhile, was wired up to a theorem proover and took three solid days to solve the problems. The article is extremely light on details, but I'm assuming that most of that time was guess-and-check: feed the theorem prover a possible answer, get feedback, adjust accordingly.

If the IMO contestants were given a theorem prover and three days (even counting breaks for sleeping and eating!), how would AlphaProof have ranked?

Don't get me wrong, this is a fun project and an exciting result, but their comparison to silver medalists at the IMO is just feeding into the excessive hype around AI, not accurately representing its current state relative to humanity.

[0] 5.1 and 5.4 in the regulations: https://www.imo-official.org/documents/RegulationsIMO.pdf

gjm111y ago

Working mathematicians mostly don't use theorem provers in their work, and find that when they do they go significantly more slowly (with of course the compensating advantage of guaranteeing no mistakes in the final result).

A theorem prover is probably more useful for the typical IMO problem than for the typical real research problem, but even so I'd guess that even with a reasonable amount of training most IMO contestants would not do much better for having access to a theorem prover.

Having three days would be a bigger deal, for sure. (But from "computers can't do this" to "computers can do this, but it takes days" is generally a much bigger step than from "computers can do this, but it takes days" to "computers can do this in seconds".)

Davidzheng1y ago

I can tell you that as someone who could have gotten bronze (i was too weak for the team) and is now a math phd--I would not have scored as well as alphaproof in three days most likely. In most problems either you find an idea soon or it can be much much longer. It's just not a matter of working and constant progress.

2 more replies

golol1y ago

The point is not to compare AI and humans, it is to compare AI and IMO-level math problems. It's not for sport.

2 more replies

blackbear_1y ago

And why aren't you complaining that human participants could train and study for thousands of hours before attempting the problems? And that the training materials they used was itself created and perfected by hundreds of other people, after having themselves spend countless hours studying?

1 more reply

dan_mctree1y ago· 3 in thread

I'm curious if we'll see a world where computers could solve math problems so easily, that we'll be overwhelmed by all the results and stop caring. The role of humans might change to asking the computer interesting questions that we care about.

mr_toad1y ago

The next step will be having an AI come up with the problems.

klysm1y ago

I'm not sure what stop caring really means - like stop caring about the result, or the implications?

Davidzheng1y ago

I think mathematicians will still care

c0l01y ago· 3 in thread

That's great, but does that particular model also know if/when/that it does not know?

ibash1y ago

Yes

> AlphaProof is a system that trains itself to prove mathematical statements in the formal language Lean. … Formal languages offer the critical advantage that proofs involving mathematical reasoning can be formally verified for correctness.

diffeomorphism1y ago

While that was probably meant to be rhetorical, the answer surprisingly seems to be an extremely strong "Yes, it does". Exciting times.

foota1y ago

Never?

Edit: To defend my response, the model definitely knows when it hasn't yet found a correct response, but this is categorically different from knowing that it does not know (and of course monkeys and typewriters etc., can always find a proof eventually if one exists).

myspeed1y ago· 3 in thread

This means we may need to remove or replace the Olympiad..It has no practical significance..Winners never contributed to any major scientific breakthroughs.

anon23452521y ago

"A number of IMO participants have gone on to become notable mathematicians. The following IMO participants have either received a Fields Medal, an Abel Prize, a Wolf Prize or a Clay Research Award, awards which recognise groundbreaking research in mathematics; a European Mathematical Society Prize, an award which recognizes young researchers; or one of the American Mathematical Society's awards (a Blumenthal Award in Pure Mathematics, Bôcher Memorial Prize in Analysis, Cole Prize in Algebra, Cole Prize in Number Theory, Fulkerson Prize in Discrete Mathematics, Steele Prize in Mathematics, or Veblen Prize in Geometry and Topology) recognizing research in specific mathematical fields. Grigori Perelman proved the Poincaré conjecture (one of the seven Millennium Prize Problems), and Yuri Matiyasevich gave a negative solution of Hilbert's tenth problem."

[...]

"IMO medalists have also gone on to become notable computer scientists. The following IMO medalists have received a Nevanlinna Prize, a Knuth Prize, or a Gödel Prize; these awards recognise research in theoretical computer science."

https://en.wikipedia.org/wiki/List_of_International_Mathemat...

nb_quant1y ago

A lot of them become Fields medallists. From [1] "The conditional probability that an IMO gold medalist will become a Fields medalist is fifty times larger than the corresponding probability for a PhD graduate from a top 10 mathematics program."

[1]: https://www.aeaweb.org/articles?id=10.1257/aeri.20190457

hnfong1y ago

(And with this comment, the 2024 Olympics commences.)

There are so many competitions that don't have any obvious practical significance. And people are still enjoying competitions where AI completely pwns humans.

Also, this is probably a good time to ask whether you won the Putnam... https://news.ycombinator.com/item?id=35079

gallerdude1y ago· 2 in thread

Sometimes I wonder if in 100 years, it's going to be surprising to people that computers had a use before AI...

necovek1y ago

AI is simply another form of what we've been doing since the dawn of computers: expressing real world problems in the form of computations.

While there are certainly some huge jumps in compute power, theory of data transformation and availability of data to transform, it would surprise me if computers in a 100 years do not still rely on a combination of well-defined and well-understood algorithms and AI-inspired tools that do the same thing but on a much bigger scale.

If not for any other reason, then because there are so many things where you can easily produce a great, always correct result simply by doing very precise, obvious and simple computation.

We've had computers and digital devices for a long while now, yet we still rely heavily on mechanical contraptions. Sure, we improve them with computers (eg. think brushless motors), but I don't think anyone would be surprised today about how did anyone design these same devices (hair dryers, lawn mowers, internal combustion engines...) before computers?

onemoresoop1y ago

If AI stays in the computer form though..

jerb1y ago· 2 in thread

Is the score of 28 comparable to the score of 29 here? https://www.kaggle.com/competitions/ai-mathematical-olympiad...

Davidzheng1y ago

No. I would say it is more impressive than 50/50 there. (Source: I used to do math comps back in the day sorry it's not a great source)

gus_massa1y ago

IIUC the American Math Olympiad has 3 rounds. Wining the last one is almost a guaranty gold medal.

The link you posted has problems with a dificulty between the first and second round that are much easier.

I took a quik look at the recent list of problems in the first and second round. I expect this new AI to get a solid 50/50 points in this test.

imranhou1y ago· 2 in thread

If the system took 3 days to solve a problem, how different is this approach than a bruteforce attempt at the problem with educated guesses? Thats not reasoning in my mind.

sigbottle1y ago

Because with AlphaGeometry it literally was just a feedback loop brute forcing over a known database of geometry axioms with an LLM to guide the guesses.

Here, from what I understand, it's instead a theorem prover + LLM backing it. General proofs have a much larger search space than the 2d geometry problems you see on IMO; many former competitors disparage geometry for that reason.

JohnPrine1y ago

it wouldn't surprise me if what we think of as intelligence is nothing more than brute force attempts at prediction with educated guesses

signa111y ago· 1 in thread

> ... but whenever IMO is brought up, a caveat should be put out: medals are awarded to 50% of the participants (high school students), with 1:2:3 ratio between gold, silver and bronze. That puts all gold and silver medalists among the top 25% of the participants.

yes, it is true, but getting to the country specific team is itself an arduous journey, and involves brutal winnowing every step of the way f.e. regional math-olympiad, and then national math-olympiad etc.

this is then followed by further trainings specifically meant for this elite bunch, and maybe further eliminations etc.

suffice it to say, that qualifying to be in a country specific team is imho a big deal. getting a gold/silver from amongst them is just plain awesome !

nb_quant1y ago

Some countries pull these kids out of school for an entire year to focus on training for it, while guaranteeing them entry into their nation's top university.

Source: a friend who got silver on the IMO

robinhouston1y ago· 1 in thread

Some more context is provided by Tim Gowers on Twitter [1].

Since I think you need an account to read threads now, here's a transcript:

Google DeepMind have produced a program that in a certain sense has achieved a silver-medal peformance at this year's International Mathematical Olympiad.

It did this by solving four of the six problems completely, which got it 28 points out of a possible total of 42. I'm not quite sure, but I think that put it ahead of all but around 60 competitors.

However, that statement needs a bit of qualifying.

The main qualification is that the program needed a lot longer than the human competitors -- for some of the problems over 60 hours -- and of course much faster processing speed than the poor old human brain.

If the human competitors had been allowed that sort of time per problem they would undoubtedly have scored higher.

Nevertheless, (i) this is well beyond what automatic theorem provers could do before, and (ii) these times are likely to come down as efficiency gains are made.

Another qualification is that the problems were manually translated into the proof assistant Lean, and only then did the program get to work. But the essential mathematics was done by the program: just the autoformalization part was done by humans.

As with AlphaGo, the program learnt to do what it did by teaching itself. But for that it needed a big collection of problems to work on. They achieved that in an interesting way: they took a huge database of IMO-type problems and got a large language model to formalize them.

However, LLMs are not able to autoformalize reliably, so they got them to autoformalize each problem many times. Some of the formalizations were correct, but even the incorrect ones were useful as training data, as often they were easier problems.

It's not clear what the implications of this are for mathematical research. Since the method used was very general, there would seem to be no obvious obstacle to adapting it to other mathematical domains, apart perhaps from insufficient data.

So we might be close to having a program that would enable mathematicians to get answers to a wide range of questions, provided those questions weren't too difficult -- the kind of thing one can do in a couple of hours.

That would be massively useful as a research tool, even if it wasn't itself capable of solving open problems.

Are we close to the point where mathematicians are redundant? It's hard to say. I would guess that we're still a breakthrough or two short of that.

It will be interesting to see how the time the program takes scales as the difficulty of the problems it solves increases. If it scales with a similar ratio to that of a human mathematician, then we might have to get worried.

But if the function human time taken --> computer time taken grows a lot faster than linearly, then more AI work will be needed.

The fact that the program takes as long as it does suggests that it hasn't "solved mathematics".

However, what it does is way beyond what a pure brute-force search would be capable of, so there is clearly something interesting going on when it operates. We'll all have to watch this space.

1. https://x.com/wtgowers/status/1816509803407040909?s=46

visarga1y ago

> If the human competitors had been allowed that sort of time per problem they would undoubtedly have scored higher.

Or if AlphaProof used more compute they could have slashed that time to 1/10 or less. It's arbitrary as long as we don't define what is the compute the AI should be entitled to use here.

StefanBatory1y ago· 1 in thread

Wow, that's absolutely impressive to hear!

Also it's making me think that in 5-10 years almost all tasks involving computer scientists or mathematicians will be done in AI. Perhaps people going into trades had a point.

visarga1y ago

Everything that allows for cheap validation is going that way. Math, code, or things we can simulate precisely. LLM ideation + Validation is a powerful combination.

1 more reply

majikaja1y ago· 1 in thread

It would be nice if on the page they included detailed descriptions of the proofs it came up with, more information about the capabilities of the system and insights into the training process...

If the data is synthetic and covers a limited class of problems I would imagine what it's doing mostly reduces to some basic search pattern heuristics which would be of more value to understand than just being told it can solve a few problems in three days.

cygaril1y ago

Proofs are here: https://storage.googleapis.com/deepmind-media/DeepMind.com/B...

1 more reply

0xd1r1y ago· 1 in thread

> As part of our IMO work, we also experimented with a natural language reasoning system, built upon Gemini and our latest research to enable advanced problem-solving skills. This system doesn’t require the problems to be translated into a formal language and could be combined with other AI systems. We also tested this approach on this year’s IMO problems and the results showed great promise.

Wonder what "great promise" entails. Because it's hard to imagine Gemini and other transformer-based models solving these problems with reasonable accuracy, as there is no elimination of hallucination. At least in the generally available products.

azeirah1y ago

I don't think that's what they mean.

They explicitly stated that to achieve the current results, they had to manually translate the problem statements into formal mathematical statements:

> First, the problems were manually translated into formal mathematical language for our systems to understand.

How I understand what they're saying is that they used gemini to translate the problem statement into formal mathematical language and let DeepMath do it's magic after that initial step.

skywhopper1y ago· 1 in thread

Except it didn’t. The problem statements were hand-encoded into a formal language by human experts, and even then only one problem was actually solved within the time limit. So, claiming the work was “silver medal” quality is outright fraudulent.

noud1y ago

I had exactly the same feeling when reading this blog. Sure, the techniques used to find the solutions are really interesting. But the claim more than they achieve. The problem statements are not available in Lean, and the time limit is 2 x 4.5 hours. Not 3 days.

The article claims they have another model that can work without formal languages, and that it looks very promising. But they don't mention how well that model performed. Would that model also perform at silver medal level?

Also note, that if the problems are provided in a formal language, you can always find the solution in finite amount of time (provided the solution exists). You can brute-force over all possible solutions until you find the solution that proofs the statement. This may take a very long time, but it will find the solutions eventually. You will always solve all the problems and win the IMO at gold medal level. Alphaproof seems to do something similar, but takes smarter decisions which possible solutions to try and which once to skip. What would be the reason they don't achieve gold?

PaulHoule1y ago· 1 in thread

See https://en.wikipedia.org/wiki/Automated_Mathematician for an early system that seems similar in some way.

golol1y ago

This Wikipedia page makes AM kind of comes across as a nonsense project whose outputs no one (besides the author) bothered to decipher.

nybsjytm1y ago· 1 in thread

To what extent is the training and structure of AlphaProof tailored specifically to IMO-type problems, which typically have short solutions using combinations of a small handful of specific techniques?

(It's not my main point, but it's always worth remembering - even aside from any AI context - that many top mathematicians can't do IMO-type problems, and many top IMO medalists turn out to be unable to solve actual problems in research mathematics. IMO problems are generally regarded as somewhat niche.)

Davidzheng1y ago

The last statement is largely correct (though idk what the imo medalists that are unable to solve actual problems most mathematicians can't solve most open problems). But i kind of disagree with the assessment of imo problems--the search space is huge if it were as you say it would be easy to search.

1 more reply

osti1y ago· 1 in thread

So they weren't able to solve the combinatorics problem. I'm not super well versed in competition math, but combinatorics always seem to be the most interesting problems to me.

sigbottle1y ago

I mean, IMO algebra problems can require very clever insights as well, and number theory especially has some really nice proof arguments you can make. It's easier to make a bad problem of this category though because it's much easier to hide the difficulty in a bunch of computations / rote deduction, and not creative insights.

Combinatorics problems are usually simple enough that anyone can understand and try tackling it though, and the solutions in IMO are usually designed to be elegant. I don't think I've ever seen a bad combo problem before.

1 more reply

lo_fye1y ago· 1 in thread

Remember when people thought computers would never be able to beat a human Grand Master at chess? Ohhh, pre-2000 life, how I miss thee.

utopcell1y ago

not to be pedantic, but Deep Blue beat Kasparov in 1997.

quirino1y ago· 1 in thread

I honestly expected the IOI (International Olympiad of Informatics) to be "beaten" much earlier than the IMO. There's AlphaCode, of course, but on the latest update I don't think it was quite on "silver medal" level. And available LLM's are probably not even on "honourable mention" level.

I wonder if some class of problems will emerge that human competitors are able to solve but are particularly tricky for machines. And which characteristics these problems will have (e.g. they'll require some sort of intuition or visualization that is not easily formalized).

Given how much of a dent LLM's are already making on beginner competitions (AtCoder recently banned using them on ABC rounds [1]), I can't help but think that soon these competitions will be very different.

[1] https://info.atcoder.jp/entry/llm-abc-rules-en

oXman0381y ago

IOI problems are more close to IMO combinatoric problems than other IMO problem types. That might be the reason for that delay. I personally like only combinatoric problems in IMO. Thats why I drop math track and went IOI instead.

I feel why combinatoric is harder for AI models is the same reason why LLM's are not great at reasoning anything out of distribution. LLM's are good pattern recognizers and fascinating at this point. But simple tasks like counting intersections at the Venn diagrams requires more strategy and less pattern recognition. Pure NN based models seem won't be enough to solve these problems. AI agents and RL are promising.

I don't know anything about lean but I am curious that proof of combinatorial problems can be as well represented as number theory or algebra. If combinatorial problem solutions are always closer to natural language, the failure of LLMs are expected. Or, at least we can assume it might take more time to make it better. I am making assumption in here that solutions of combinatorial problems in IMO are more human language oriented and relies on more common sense/informal logic when it compared to geometry or number theory problems.

1 more reply

ckcheng1y ago· 1 in thread

There doesn’t seem to be much information on how they attempted and failed to solve the combinatorial type problems.

Anyone know any details?

ckcheng1y ago

I asked around and all I got was this: https://news.ycombinator.com/item?id=41150581

brap1y ago· 1 in thread

Are all of these specialized models available for use? Like, does it have an API?

I wonder because on one hand they seem very impressive and groundbreaking, on the other it’s hard to imagine why more than a handful of researchers would use them

creata1y ago

> it’s hard to imagine why more than a handful of researchers would use them

If you could automatically prove that your concurrency protocol is safe, or that your C program has no memory management mistakes, or that your algorithm always produces the same results as a simpler, more obviously correct but less optimized algorithm, I think that would be a huge benefit for many programmers.

sssummer1y ago· 1 in thread

Why frontier models can both achieve silver medal in Math Olympiad but also fail to answer "which number is bigger, 9.11 or 9.9"?

utopcell1y ago

..because not all systems are of the same quality.

mupuff12341y ago· 1 in thread

Can it / did it solve problems that weren't solved yet?

raincole1y ago

Techinically yes. And it's easy. You can probably do it with your PC's computational power.

The thing is that most math "problems" are not solved not becasue they're hard, but because they're not interesting enough to even be discovered by humans.

1 more reply

szundi1y ago· 1 in thread

Like it understands any of it

johnfn1y ago

Do you understand any of it?

1 more reply

dinobones1y ago· 1 in thread

I see DeepMind is still playing around with RL + search algorithms, except now it looks like they're using an LLM to generate state candidates.

I don't really find that this impressive. With enough compute you could just do n-of-10,000 LLM generations to "brute force" a difficult problem and you'll get there eventually.

richard___1y ago

Sigh. Just wrong

1 more reply

michael_nielsen1y ago

A good brief overview here from Tim Gowers (a Fields Medallist, who participated in the effort), explaining and contextualizing some of the main caveats: https://x.com/wtgowers/status/1816509803407040909

zone4111y ago

The best discussion is here: https://leanprover.zulipchat.com/#narrow/stream/219941-Machi...

Jun81y ago

Tangentially: I found it fascinating to follow along the solution to Problem 6: https://youtu.be/7h3gJfWnDoc (aquaesulian is a node to ancient name of Bath). There’s no advanced math and each step is quite simple, I’d guess on a medium 8th grader level.

Note that the 6th question is generally the hardest (“final boss”) and many top performers couldn’t solve it.

I don’t know what Lean is or how see AI’s proofs but an AI system that can explain such a question on par with the YouTuber above would be fantastic!

nopinsight1y ago

Once Gemini, the LLM, integrates with AlphaProof and AlphaGeometry 2, it might be able to reliably perform logical reasoning. If that's the case, software development might be revolutionized.

"... We'll be bringing all the goodness of AlphaProof and AlphaGeometry 2 to our mainstream #Gemini models very soon. Watch this space!" -- Demis Hassabis, CEO of Google DeepMind. https://x.com/demishassabis/status/1816499055880437909

riku_iki1y ago

Example of proof from AlphaProof system: https://storage.googleapis.com/deepmind-media/DeepMind.com/B...

SJC_Hacker1y ago

The kicker with some of those math competition problems, there will be problems that reduce to finding all natural numbers for which some statement is true. These are almost always small numbers, less than 100 in most circumstances.

Which means these problems are trivial to solve if you have a computer - you can simply check all possibilities. And is precisely the reason why calculators aren't allowed.

But exhaustive searches are not feasible by hand in the time span the problems are supposed to be solved - roughly 30 minutes per problem. You are not supposed to use brute force, but recognize a key insight which simplifies the problem. And I believe even if you did do an exhaustive search, simply giving the answer is not enough for full points. You would have to give adequate justification.

amarant1y ago

This is quite cool! I've found logical reasoning to be one of the biggest weak points of LLMs, nice to see that an alternative approach works better! I've tried to enlist gpt to help me play a android game called 4=10, where you solve simple math problems, and gpt was hilariously terrible at it. It would both break the rules I described, and make math mistakes, such as claiming 6*5-5+8=10

I wonder if this new model could be integrated with an LLM somehow? I get the feeling that combining those two powers would result in a fairly capable programmer.

Also perhaps a LLM could do the translation step that is currently manual?

nitrobeast1y ago

Reading into the details, the system is more impressive than the title. 100% of the algebra and geometry problems were solved. The remaining problems are of combinatorial types, which ironically more closely resembles software engineering work.

_heimdall1y ago

I'm still unclear whether the system used here is actually reasoning through the process of solving the problem, or brute forcing solutions with reasoning coming in during the mathematical proof of each potential proof.

Is it clear whether the algorithm is actually learning from why previously attempted solutions failed to prove out, or is it statistically generating potential answers similar to an LLM and then trying to apply reasoning to prove out the potential solution?

HL33tibCe71y ago

This is kind of an ideal use-case for AI, because we can say with absolute certainty whether their solution is correct, completely eliminating the problem of hallucination.

seydor1y ago

We need to up the ante: Getting human-like performance on any task is not impressive in itself, what matters is superhuman, orders of magnitude above. These comparisons with humans in order create impressive sounding titles are disguising the fact that we are still at the stone age of intelligence.

zhiQ1y ago

Coincidentally, I just posted about how well LLMs handle adding long strings of numbers: https://userfriendly.substack.com/p/discover-how-mistral-lar...

1024core1y ago

> The system was allowed unlimited time; for some problems it took up to three days. The students were allotted only 4.5 hours per exam.

I know speed is just a matter of engineering, but looks like we still have a ways to go. Hold the gong...

stonethrowaway1y ago

It’s like bringing a rocket launcher to a fist fight but I’d like to use these math language models to find gaps in logic when people are making online arguments. It would be an excellent way to verify who has done their homework.

arnabgho1y ago

https://x.com/GoogleDeepMind/status/1816498082860667086

pnjunction1y ago

Brilliant and so encouraging!

>because of limitations in reasoning skills and training data

One would assume that mathematical literature and training data would be abundant. Is there a simple example that could help appreciate the Gemini bridge layer mentioned in the blog which produces the input for RL in Lean?

djaouen1y ago

Is it really such a smart thing to train a non-human "entity" to beat humans at math?

111010100011001y ago

Can anyone comment on how different the AI generated proofs are when compared to those of humans? Recent chess engines have had some 'different' ideas.

gowld1y ago

Why is it so hard to make an AI that can translate an informally specified math problem (and Geometry isn't even so informal) into a formal representation?

quantum_state1y ago

It’s as impressive as if not more than AI beating a chess master. But are we or should we be really impressed?

rowanG0771y ago

Is this just google blowing up their own asses or is this actually useable with some sane license?

atum471y ago

Oh, the title was changed to international math Olympiad. I was reading IMO as in my opinion, haha

fovc1y ago

6 months ago I predicted Algebra would be next after geometry. Nice to see that was right. I thought number theory would come before combinatorics, but this seems to have solved one of those. Excited to dig into how it was done

https://news.ycombinator.com/item?id=39037512

mathinaly1y ago

How do they know their formalization of the informal problems into formal ones was correct?

__0x011y ago

Please could someone explain, very simply, what the training data was composed of?

m3kw91y ago

Is it one of those slowly slowly then suddenly things? I hope so

amelius1y ago

How long until this tech is integrated into compilers?

dmitrygr1y ago

> First, the problems were manually translated into formal mathematical language

That is more than half the work of solving them. Headline should read "AI solves the simple part of each IMA problem at silver medal level"

refulgentis1y ago

Goalposts at the moon, FUD at "but what if its obviously fake?".

Real, exact, quotes from the top comments at 1 PM EST.

"I want to believe that the computer found it, but I can't find anything to confirm."

"Curing cancer will require new ideas"

"Maybe they used 10% of all of GCP [Google compute]"

gerdesj1y ago

Why on earth did the "beastie" need the questions translating?

So it failed at the first step (comprehension) and hence I think we can request a better effort next time.

badrunaway1y ago

This will in a few months change everything forever. Exponential growth incoming soon from Deepmind systems.

thoiwer234231y ago

And yet it thinks 3.11 is greater than 3.9

(probably confused by version numbers)

mik091y ago

how long before it solves the last two problems?

gyudin1y ago

Haha, what a dumb tincan (c) somebody on Twitter right now :D

hendler1y ago

see also https://leandojo.org/

machiaweliczny1y ago

Good, now use DiffusER on algebra somehow please

pmcf1y ago

I read this as “In My Opinion” and really thought this about AI dealing with opinionated people. Nope. HN is still safe. For now…

Sparkyte1y ago

In other news today calculator solves math problem.

hulitu1y ago

> AI solves International Math Olympiad problems at silver medal level

> In the official competition, students submit answers in two sessions of 4.5 hours each. Our systems solved one problem within minutes and took up to three days to solve the others.

Why not compare with students who are given 3 days to submit an answer ? /s

Davidzheng1y ago

HOLY SHIT. It's amazing

data_maan1y ago

It's bullshit. AlphaGeometry can't even solve Pythagoras theorem.

Not opensourcing anything.

This is a dead end on which no further research can be built.

It violates pretty much every principle of incremental improvement on which science is based. It's here just for hype, and the 300+ comments prove it.

NoblePublius1y ago

A lot of words to say second place

iamronaldo1y ago

No benchmarks of any kind?

xyst1y ago

Billions of dollars spent building this, gW of energy used to train it. And the best it could do is “silver”?

Got to be kidding me. We are fucked

AyyEye1y ago

Parlor tricks. Wake me up AI can reliably identify which number is circled at the level of my two year old.

rich_sasha1y ago

I'm actually not that surprised. Maths Olympiads IME have always been 80% preparation, 20% skill - if not more heavily tuned to preparation. It was all about solving as many problems as possible ahead of the papers, and having a good short term memory. Since Olympiads are for kids, the amount of actual fundamental mathematical theorems required is actually not that great.

Sounds perfect for a GPT model, with lots of input training data (problem books and solutions).

gowld1y ago

This shows a major gap in AI.

The proofs of these problems aren't interesting. They were already known before the AI started work.

What's interesting is how the AI found the proof. The only answer we have is "slurped data into a neural network, matched patterns, and did some brute search".

What were the ideas it brainstormed? What were the dead-end paths? What were the "activations" where the problem seemed similar to a certain piece of input, which led to a guess of a step in the solution?

balls1871y ago

What was the total energy consumption required to acheive this result (both training and running)

And, how much CO2 was released into earths atmosphere?

4 more replies

j / k navigate · click thread line to collapse

525 comments

259 comments · 84 top-level

adverbly1y ago· 24 in thread

Three days is interesting... Not technically silver medal performance I guess, but let's be real I'd be okay waiting a month for the cure to cancer.

ZenMikey1y ago

qsort1y ago

tsoj1y ago

This is NOT the paper, but probably a very similar solution: https://arxiv.org/abs/2009.03393

visarga1y ago

> just fumbling its way around until it guessed correctly

As opposed to 0.999999% of the human population who can't do it even if their life depends on it?

2 more replies

thomasahle1y ago

They just write "it's like alpha zero". So presumably they used a version of MCTS where each terminal node is scored by LEAN as either correct or incorrect.

Then they can train a network to evaluate intermediate positions (score network) and one to suggest things to try next (policy network).

lacker1y ago

The training loop was also applied during the contest, reinforcing proofs of self-generated variations of the contest problems until a full solution could be found.

So they had three days to keep training the model, on synthetic variations of each IMO problem.

utopcell1y ago

I'm at work and reading this article is the first thing I did this morning. What's your point ?

101001101y ago

Don't confuse interpolation with extrapolation. Curing cancer will require new ideas. IMO requires skill proficiency in tasks where the methods of solving are known.

trotro1y ago

The methods are know, but the solutions to the IMO problems weren't. So the AI did extrapolate a solution.

2 more replies

im3w1l1y ago

golol1y ago

Mathematicians spend most of their time interpolating between known ideas and it would be extremely helpful to have computer assistance with that.

xdavidliu1y ago

visarga1y ago

1 more reply

trueismywork1y ago

They are the same things

wongarsu1y ago

But even if they aren't as fast as humans yet this is very valuable. Both as a stepping stone, and because at a certain scale compute is much easier to scale than skilled mathematicians.

gjm111y ago

I'd guess it's the first of those.

(Euclidean geometry has been a kinda-solved domain for some time; it's not super-surprising that they were able to solve that problem quickly.)

As for the long solve times, I would guess they're related to this fascinating remark:

> The training loop was also applied during the contest, reinforcing proofs of self-generated variations of the contest problems until a full solution could be found.

1 more reply

nnarek1y ago

"three days" does not say anything about how much computational power is used to solve problems, maybe they have used 10% of all GCP :)

falcor841y ago

2 more replies

vlovich1231y ago

1 more reply

ComplexSystems1y ago

Or the simultaneous discovery of thousands of cryptographic exploits...

poincaredisk1y ago

4 more replies

lolinder1y ago

How many cycles of guess-and-check did it take over the course of three days to get the right answer?

If the IMO contestants were allowed to use theorem provers and were given 3 days (even factoring in sleep) would AlphaProof still have gotten silver?

> let's be real I'd be okay waiting a month for the cure to cancer.

golol1y ago

I believe you are misreading this.

First of all, this is not a sport and the point is not to compare AI to humans. The point is to compare AI to IMO-difficulty problems.

2 more replies

regularfry1y ago

I'm not sure it matters that it had access to a theorem prover. The fact that it's possible to build a black box that solves hard proofs on its own at all is the fascinating bit.

> it still took 8x as long to solve the problems as our best humans did without any computer assistance.

Give it a year and that ratio will be reversed. At least. But also it matters less how long it takes if doubling the number of things reasoning at a best-human level is pronounced "ctrl-c, ctrl-v".

Smaug1231y ago· 19 in thread

So I am extremely hyped about this, but it's not clear to me how much heavy lifting this sentence is doing:

> First, the problems were manually translated into formal mathematical language for our systems to understand.

ocfnashOP1y ago

The computer did find the answers itself. I.e., it found "even integers" for P1, "{1,1}" for P2, and "2" for P6. It then also provided provided a Lean proof in each case.

freehorse1y ago

nnarek1y ago

6 more replies

Davidzheng1y ago

Can you elaborate on how it makes guesses like this? Does it do experiments before? Is it raw LLM? Is it feedback loop based on partial progress?

2 more replies

summerlight1y ago

thomasahle1y ago

You'd think that, but Timothy Gowers (the famous mathematician they worked with) wrote (https://x.com/wtgowers/status/1816509817382735986)

So didn't actually solve autoformalization, which is why they still needed humans to translate the input IMO 2024 problems.

2 more replies

ajross1y ago

> To speak generally, that translation part is much easier than the proof part.

If you ask a AI (or person) to prove the halting problem, they can't. If you "translate" the question into a specific example that does halt, they can run it and find out.

I'm suspicious, basically.

dooglius1y ago

The linked page says

> While the problem statements were formalized into Lean by hand, the answers within the problem statements were generated and formalized by the agent.

However, it's unclear what initial format was given to the agents that allowed this step

Smaug1231y ago

1 more reply

pclmulqdq1y ago

So if Lean was used to find the answers, where exactly is the AI? A thin wrapper around Lean?

4 more replies

zerocrates1y ago

Interesting that they have a formalizer (used to create the training data) but didn't use it here. Not reliable enough?

golol1y ago

> When presented with a problem, AlphaProof generates solution candidates and then proves or disproves them by searching over possible proof steps in Lean.

I think if Alphaproof did not find {foo} but it was given then it would be very outrageous to claim that it solved the problem.

I am also very hyped.

sebzim45001y ago

I as someone with a maths degree but who hasn't done this kind of thing for half a decade, was able to immediately guess the solution to (1) but actually proving it is much harder.

gowld1y ago

The article says

> AlphaProof solved two algebra problems and one number theory problem by determining the answer and proving it was correct.

rldjbpin1y ago

still good progress nonetheless. won't call the system sufficient by itself tho.

SonOfLilit1y ago

My mathematician friend said problem 5 (I think? With the monsters) seems hard to formulate, so I spent 15 minutes formulating it in pseudo-haskell.

Then he gave me a huge hint to the solution, after which it only took me a couple of hours to solve.

(Formalizing the solution is of course the hardest part, and might serve as a good masters dissertation I think)

1 more reply

kurthr1y ago

As is often the case, creating a well formed problem statement often takes as much knowledge (if not work) as finding the solution.

hyfgfh1y ago

> First, the problems were manually translated into formal mathematical language for our systems to understand.

Some people call this programming

allxnb1y ago

Presenting this just as "translating into formal language" omits important information.

Lean isn't just a formal language, it is also a theorem prover, Could the IMO participants use the nlinarith tactic? Could they use other tactics?

Of course not, they had to show their work!

Could they have mathematicians translate the problem statements into the formal language for them?

Of course not, they had to do it themselves. In "How to solve it" Polya stresses multiple times that formalizing the initial question is an important part of the process.

Then, the actual computational resources expressed in time are meaningless if one has a massive compute cloud.

I'm a bit dissatisfied with the presentation, same as with the AlphaZero comparison to an obsolete Stockfish version that has been debunked multiple times.

necovek1y ago· 17 in thread

That means that "AI solves IMO problems better than 75% of the students", which is probably even more impressive.

Why not be more objective and report that it took longer but was able to solve X% of problems (or scored X out of N points)?

SonOfLilit1y ago

I have met IMO competitors. They are insanely smart. I wouldn't imagine it's possible to be that smart before I started hanging in these circles. So more like 25% of 0.01% of high school students.

lupire1y ago

But many more of those students (including many not in the contest) could solve those problems given more time than the 4.5hr sessions.

It's just a weird comparison. Contests are very artificially structured in ways that don't make sense for comparing to computers.

1 more reply

necovek1y ago

You probably misinterpreted the above: 25% of the participants in IMO, who are generally high school students. Never claimed that 25% of all high schoolers would do better.

astromaniak1y ago

NiloCK1y ago

> medals are awarded to 50% of the participants (high school students)

Davidzheng1y ago

lozenge1y ago

3 more replies

llwu1y ago

Partial credit is quite challenging to earn, per: https://www.imo-official.org/year_statistics.aspx?year=2024

1 more reply

dooglius1y ago

1 more reply

tardygrade1y ago

One key difference between giving humans more time and giving computer programs more time is that historically we have had more success making the latter run faster than the former.

muglug1y ago

> What's the need to taint the impressive result with apples-to-oranges comparison?

Most of DeepMind’s research is a cost-centre for the company. These press releases help justify the continued investment both to investors and to the wider public.

utopcell1y ago

> Most of DeepMind’s research is a cost-centre for the company.

The effect of establishing oneself as the thought leader in a field is enormous.

For example, IBM's stock went up 15% the month after they beat Kasparov.

mensetmanusman1y ago

Cost centers are profit centers when R&D is successful.

acchow1y ago

But computers get faster each year, so even with zero progress in actual AI, this will reach human-student speeds in a few years (need a 40x speed up)

awesomeMilou1y ago

Could you explain where the 40x speedup comes from, given that literally the biggest problem in semi conductors right now is smaller node size?

2 more replies

asah1y ago

I'd say "welcome to the web" but this was true in 1800s newspapers as well.

xkcd19631y ago

Because that wouldn't be hype enough.

cynicalpeace1y ago· 16 in thread

Machines have been better than humans at chess for decades.

Yet no one cares. Everyone's busy watching Magnus Carlsen.

We are human. This means we care about what other humans do. We only care about machines insofar as it serves us.

This principle is broadly extensible to work and art. Humans will always have a place in these realms as long as humans are around.

ertgbnm1y ago

cynicalpeace1y ago

My point was where does the concept of "meaningful" come from?

The proof will only have value if it's meaningful to us.

camjw1y ago

Sure but if an AI can prove e.g the Goldbach conjecture then that is a bfd.

hyperbovine1y ago

What if the proof were incomprehensible to humans?

4 more replies

cynicalpeace1y ago

How is that a counterpoint?

christianqchung1y ago

> This principle is broadly extensible to work and art

cynicalpeace1y ago

Read the next sentence: "We only care about machines insofar as it serves us."

1 more reply

auggierose1y ago

zone4111y ago

cynicalpeace1y ago

Why does it not extend to math proofs?

smokel1y ago

> Everyone's busy watching Magnus Carlsen.

Actually, I was looking up Elo ratings of the top computer chess players, and learned that it is not that trivial to compare these, due to differences in hardware requirements and whatnot.

cynicalpeace1y ago

Are you arguing computer chess players are as popular as human chess players?

1 more reply

bongodongobob1y ago

Eh, people definitely care. AI has completely changed chess. Non viable lines have been proven viable and vice versa. All the pros study and develop new lines with AI.

awahab921y ago

magnus carlsen basically quit because computers ruined chess. As did kasparov.

Fischer was probably the last great player who was unassisted by tools.

karmakurtisaani1y ago

Carlsen plays still at the highest level. He just didn't want to do the world championship anymore, wasn't worth the effort after winning it so many times.

hyperbovine1y ago

?? Carlsen is very much active -- look up the YouTube channel EpicChess for an extremely entertaining recap of what he's up to recently.

golol1y ago· 14 in thread

fmap1y ago

If this approach scales it'll be far more important than any other ML application that has come out in the last few years.

cubefox1y ago

2 more replies

visarga1y ago

> a lot of brute force search

gowld1y ago

2 more replies

thomasahle1y ago

Also AlphaProof had to search for 60 hours for one of the IMO problems it solved.

2 more replies

Eridrus1y ago

Search is great, search works, but there was not a tonne to learn from the AlphaGeometry paper unless you were specifically interested in solving geometry problems.

kypro1y ago

My old AI professor used to say that every problem is a search problem.

The issue is that to find solutions for useful problems you're often searching through highly complex and often infinite solution spaces.

3 more replies

Davidzheng1y ago

Yes and there's a lot of search here too. That's a key to the approach

sideeffffect1y ago

People are already working on this

https://leandojo.org/

https://machine-learning-for-theorem-proving.github.io/

https://www.youtube.com/watch?v=P5ew0BrRm_I

https://paperswithcode.com/task/automated-theorem-proving

https://old.reddit.com/r/math/comments/11mb9lx/future_of_aut...

https://github.com/RiccardoBiosas/LeanGPT

geysersam1y ago

I imagine a system like this to be vastly more useful outside the realm of mathematics research.

creata1y ago

But for it to be 100% trustworthy, you'd have to express correctness criteria for those simple tasks as formal statements.

3 more replies

EugeneOZ1y ago

No. It's like you are allowed to use search engines to find a solution, nothing more than that.

humansareok11y ago

Through a search space often large enough to be completely intractable with a galaxy wide computer.

benreesman1y ago

As resident strident AI skeptic, yeah, this is real.

But MCTS was always promising when married to large NNs and DeepMind/Brain were always in front on it.

I don’t know who fucked up on Gemini and it’s concerning for Alphabet shareholders that no one’s head is on a spike. In this context “too big to fail” is probably Pichai.

But only very foolish people think that Google is lying down on this. It’s Dean and Hassabis. People should have some respect.

Ericson23141y ago· 10 in thread

The lede is a bit buried: they're using Lean!

This is important for more than Math problems. Making ML models wrestle with proof systems is a good way to avoid bullshit in general.

Hopefully more humans write types in Lean and similar systems as a much way of writing prompts.

Smaug1231y ago

empath751y ago

People want always knock generative AIs for not being able to reason, and we've had automated systems that reason perfectly well for decades, but for some reason that doesn't count as AI to people.

Davidzheng1y ago

Good. I want my AI to use all the advantages it has to reinvent the landscape of mathematics

2 more replies

Ericson23141y ago

Hehe, well, we'll need to have a tool-assited international math Olympiad then.

2 more replies

sebzim45001y ago

The uses of `nlinarith` are very straight forward manipulations of inequalities, they would be one or two steps for a human too.

xrisk1y ago

Can you give some context on how using Lean benefits?

In my understanding, proofs are usually harder to transcribe into Lean which is nobody _writes_ proofs using Lean.

What is a nlinarith?

1 more reply

gowld1y ago

That's not necessarily an edge. https://www.ma.imperial.ac.uk/~buzzard/xena/formalising-math...

has the example:

  0 ≤ x^2 if x : ℝ

which humans simply use without proof. The IMO doesn't challenge participants to prove everything, only the main ideas.

queuebert1y ago

1. https://lean-lang.org/

Ericson23141y ago

They're def gonna go after the Riemann hypothesis with this, hehe.

nwoli1y ago

Guessing the context here is that the RH was recently translated into Lean. Would be very cool if they threw their compute on that

1 more reply

cs7021y ago· 9 in thread

---

[a] http://www.incompleteideas.net/IncIdeas/BitterLesson.html

tux31y ago

Today's announcement is also not about proving Lean theorems by "dumb search". The success is about search + neural networks.

You're attacking critics for criticizing the solution that has failed, while confusing it for the solution that works to this day.

2 more replies

klyrs1y ago

And you keep shifting the goalpost on what's called "dumb" here.

2 more replies

Davidzheng1y ago

mikeknoop1y ago

fspeech1y ago

netcan1y ago

Well written.

It kinda had to be this way, I think. There's a something from nothing problem. Douglas Adams brilliantly starts at this point.

We don't understand something from nothing. We don't even have the language to describe it. Concepts like "complexity" are frustratingly resistant to formalization.

"There is no free will." Has recently resurged as a philosophical position... like it did in response to Newton's mechanics.

I don't think this debate was supposed to be ended by anything short of empirical demonstration.

Endnote: deep blue's victory over Gary was a bunch of preprogrammed bulls--t. Rematch!

creer1y ago

If one of the lessons of LLMs was that much of "common sense" is covered in the written language corpus - that is, perhaps, basic human intelligence is covered by language.

All this still lacks self-directed goals. An intention. For now that's taken care of by the human asking questions.

1 more reply

sobellian1y ago

From whence are you quoting the phrase "dumb search?"

1 more reply

TheDudeMan1y ago

So all forms of search are dumb in your view?

1 more reply

fancyfredbot1y ago· 8 in thread

I'm seriously jealous of the people getting paid to work on this. Sounds great fun and must be incredibly satisfying to move the state of the art forward like that.

GuB-421y ago

Now, you can also have an awesome work environment and an awesome subject, it is like hitting the jackpot... and a good reason to be envious.

phillypham1y ago

Awesome work environment for one person can be not ideal for another.

1 more reply

lonesword1y ago

onemoresoop1y ago

You probably mean envious not jealous.

yalok1y ago

I'm learning something new today. In some other languages these 2 are usually the same 1 word.

4 more replies

Mithriil1y ago

Best we can do then is keep ourselves up to date and give our support!

bearjaws1y ago

C'mon you're meant to be re-configuring 3,292,329 line of YML for K8s.

(/s)

psbp1y ago

It's funny that if I could describe my entire career, it would probably be something similar to software janitor/maintenance worker.

I guess I should have pursued a PhD when I was younger.

1 more reply

thrance1y ago· 6 in thread

Theorem proving is a single-player game with an insanely big search space, I always thouht it would be solved long before AGI.

IMHO, the largest contributors to AlphaProof were the people behind Lean and Mathlib, who took the daunting task of formalizing the entirety of mathematics to themselves.

camjw1y ago

> Theorem proving is a single-player game with an insanely big search space, I always thouht it would be solved long before AGI.

thrance1y ago

On the other hand, navigating the real world mostly consists in employing a ton of heuristics we are still kind of clueless about.

At the end of the day, we won't know before we get there, but I think my reasons are compelling enough to think what I think.

2 more replies

pedrosorio1y ago

Replace "this thing is better than Terrence Tao at maths" with "this thing is better than Garry Kasparov at chess" and your statement would sound equally reasonable in the early 90s.

HarHarVeryFunny1y ago

In the meantime, while DeepBlue beat the world chess champion, Kasparov, at chess, our best efforts at generalism - LLMs than many (not me!) think are the path to AGI - struggle to play tic tac toe.

raincole1y ago

Because "Generally Intelligent" is a very broad and vague term.

"Better than Terrence Tao at solving certain formalized problems" (not necessarily equal to "Better that Terrence Tao at maths) isn't.

2 more replies

Davidzheng1y ago

They didn't formalize the entirety of math. Good thing imo doesn't need the entirety. But they didn't even formalize enough for imo--this is probably why combo wasn't solved

piombisallow1y ago· 6 in thread

IMO problems aren't fundamentally different from chess or other games, in that the answer is already known.

energy1231y ago

Smaug1231y ago

1 more reply

wufufufu1y ago

Kinda? Chess isn't solved. Complex problems can have better solutions discovered in the future.

1 more reply

osti1y ago

But the answer is probably not known by you, in particular.

1 more reply

almostgotcaught1y ago

> in that the answer is already known.

2 more replies

Davidzheng1y ago

lol

petters1y ago· 5 in thread

The problems were first converted into a formal language. So they were partly solved by the AI

golol1y ago

jeremyjh1y ago

trotro1y ago

clbrmbr1y ago

rpois1y ago

Does this formalization process include giving it the answer it should try to prove?

2 more replies

lumb631y ago· 5 in thread

Can someone explain why proving and math problem solving is not a far easier problem for computers? Why does it require any “artificial intelligence” at all?

What am I missing that makes this problem so much harder than applying a finite and known set of axioms and manipulations?

psb2171y ago

zone4111y ago

ComplexSystems1y ago

booleandilemma1y ago

Proofs require a certain ingenuity that computers just don't have, imo. A computer would never be able to come up with something like Cantor's diagonalization proof on its own.

1 more reply

runeblaze1y ago

Another answer is that 3SAT and co can be seen as distilled variants of proving statements. Well, 3SAT is famously hard.

lolinder1y ago· 4 in thread

This is a fun result for AI, but a very disingenuous way to market it.

If the IMO contestants were given a theorem prover and three days (even counting breaks for sleeping and eating!), how would AlphaProof have ranked?

[0] 5.1 and 5.4 in the regulations: https://www.imo-official.org/documents/RegulationsIMO.pdf

gjm111y ago

Davidzheng1y ago

2 more replies

golol1y ago

The point is not to compare AI and humans, it is to compare AI and IMO-level math problems. It's not for sport.

2 more replies

blackbear_1y ago

1 more reply

dan_mctree1y ago· 3 in thread

mr_toad1y ago

The next step will be having an AI come up with the problems.

klysm1y ago

I'm not sure what stop caring really means - like stop caring about the result, or the implications?

Davidzheng1y ago

I think mathematicians will still care

c0l01y ago· 3 in thread

That's great, but does that particular model also know if/when/that it does not know?

ibash1y ago

Yes

diffeomorphism1y ago

While that was probably meant to be rhetorical, the answer surprisingly seems to be an extremely strong "Yes, it does". Exciting times.

foota1y ago

Never?

myspeed1y ago· 3 in thread

This means we may need to remove or replace the Olympiad..It has no practical significance..Winners never contributed to any major scientific breakthroughs.

anon23452521y ago

[...]

https://en.wikipedia.org/wiki/List_of_International_Mathemat...

nb_quant1y ago

[1]: https://www.aeaweb.org/articles?id=10.1257/aeri.20190457

hnfong1y ago

(And with this comment, the 2024 Olympics commences.)

There are so many competitions that don't have any obvious practical significance. And people are still enjoying competitions where AI completely pwns humans.

Also, this is probably a good time to ask whether you won the Putnam... https://news.ycombinator.com/item?id=35079

gallerdude1y ago· 2 in thread

Sometimes I wonder if in 100 years, it's going to be surprising to people that computers had a use before AI...

necovek1y ago

AI is simply another form of what we've been doing since the dawn of computers: expressing real world problems in the form of computations.

If not for any other reason, then because there are so many things where you can easily produce a great, always correct result simply by doing very precise, obvious and simple computation.

onemoresoop1y ago

If AI stays in the computer form though..

jerb1y ago· 2 in thread

Is the score of 28 comparable to the score of 29 here? https://www.kaggle.com/competitions/ai-mathematical-olympiad...

Davidzheng1y ago

No. I would say it is more impressive than 50/50 there. (Source: I used to do math comps back in the day sorry it's not a great source)

gus_massa1y ago

IIUC the American Math Olympiad has 3 rounds. Wining the last one is almost a guaranty gold medal.

The link you posted has problems with a dificulty between the first and second round that are much easier.

I took a quik look at the recent list of problems in the first and second round. I expect this new AI to get a solid 50/50 points in this test.

imranhou1y ago· 2 in thread

If the system took 3 days to solve a problem, how different is this approach than a bruteforce attempt at the problem with educated guesses? Thats not reasoning in my mind.

sigbottle1y ago

Because with AlphaGeometry it literally was just a feedback loop brute forcing over a known database of geometry axioms with an LLM to guide the guesses.

JohnPrine1y ago

it wouldn't surprise me if what we think of as intelligence is nothing more than brute force attempts at prediction with educated guesses

signa111y ago· 1 in thread

this is then followed by further trainings specifically meant for this elite bunch, and maybe further eliminations etc.

suffice it to say, that qualifying to be in a country specific team is imho a big deal. getting a gold/silver from amongst them is just plain awesome !

nb_quant1y ago

Some countries pull these kids out of school for an entire year to focus on training for it, while guaranteeing them entry into their nation's top university.

Source: a friend who got silver on the IMO

robinhouston1y ago· 1 in thread

Some more context is provided by Tim Gowers on Twitter [1].

Since I think you need an account to read threads now, here's a transcript:

Google DeepMind have produced a program that in a certain sense has achieved a silver-medal peformance at this year's International Mathematical Olympiad.

It did this by solving four of the six problems completely, which got it 28 points out of a possible total of 42. I'm not quite sure, but I think that put it ahead of all but around 60 competitors.

However, that statement needs a bit of qualifying.

If the human competitors had been allowed that sort of time per problem they would undoubtedly have scored higher.

Nevertheless, (i) this is well beyond what automatic theorem provers could do before, and (ii) these times are likely to come down as efficiency gains are made.

That would be massively useful as a research tool, even if it wasn't itself capable of solving open problems.

Are we close to the point where mathematicians are redundant? It's hard to say. I would guess that we're still a breakthrough or two short of that.

But if the function human time taken --> computer time taken grows a lot faster than linearly, then more AI work will be needed.

The fact that the program takes as long as it does suggests that it hasn't "solved mathematics".

However, what it does is way beyond what a pure brute-force search would be capable of, so there is clearly something interesting going on when it operates. We'll all have to watch this space.

1. https://x.com/wtgowers/status/1816509803407040909?s=46

visarga1y ago

> If the human competitors had been allowed that sort of time per problem they would undoubtedly have scored higher.

Or if AlphaProof used more compute they could have slashed that time to 1/10 or less. It's arbitrary as long as we don't define what is the compute the AI should be entitled to use here.

StefanBatory1y ago· 1 in thread

Wow, that's absolutely impressive to hear!

Also it's making me think that in 5-10 years almost all tasks involving computer scientists or mathematicians will be done in AI. Perhaps people going into trades had a point.

visarga1y ago

Everything that allows for cheap validation is going that way. Math, code, or things we can simulate precisely. LLM ideation + Validation is a powerful combination.

1 more reply

majikaja1y ago· 1 in thread

It would be nice if on the page they included detailed descriptions of the proofs it came up with, more information about the capabilities of the system and insights into the training process...

cygaril1y ago

Proofs are here: https://storage.googleapis.com/deepmind-media/DeepMind.com/B...

1 more reply

0xd1r1y ago· 1 in thread

azeirah1y ago

I don't think that's what they mean.

They explicitly stated that to achieve the current results, they had to manually translate the problem statements into formal mathematical statements:

> First, the problems were manually translated into formal mathematical language for our systems to understand.

How I understand what they're saying is that they used gemini to translate the problem statement into formal mathematical language and let DeepMath do it's magic after that initial step.

skywhopper1y ago· 1 in thread

noud1y ago

PaulHoule1y ago· 1 in thread

See https://en.wikipedia.org/wiki/Automated_Mathematician for an early system that seems similar in some way.

golol1y ago

This Wikipedia page makes AM kind of comes across as a nonsense project whose outputs no one (besides the author) bothered to decipher.

nybsjytm1y ago· 1 in thread

Davidzheng1y ago

1 more reply

osti1y ago· 1 in thread

So they weren't able to solve the combinatorics problem. I'm not super well versed in competition math, but combinatorics always seem to be the most interesting problems to me.

sigbottle1y ago

1 more reply

lo_fye1y ago· 1 in thread

Remember when people thought computers would never be able to beat a human Grand Master at chess? Ohhh, pre-2000 life, how I miss thee.

utopcell1y ago

not to be pedantic, but Deep Blue beat Kasparov in 1997.

quirino1y ago· 1 in thread

[1] https://info.atcoder.jp/entry/llm-abc-rules-en

oXman0381y ago

1 more reply

ckcheng1y ago· 1 in thread

There doesn’t seem to be much information on how they attempted and failed to solve the combinatorial type problems.

Anyone know any details?

ckcheng1y ago

I asked around and all I got was this: https://news.ycombinator.com/item?id=41150581

brap1y ago· 1 in thread

Are all of these specialized models available for use? Like, does it have an API?

I wonder because on one hand they seem very impressive and groundbreaking, on the other it’s hard to imagine why more than a handful of researchers would use them

creata1y ago

> it’s hard to imagine why more than a handful of researchers would use them

sssummer1y ago· 1 in thread

Why frontier models can both achieve silver medal in Math Olympiad but also fail to answer "which number is bigger, 9.11 or 9.9"?

utopcell1y ago

..because not all systems are of the same quality.

mupuff12341y ago· 1 in thread

Can it / did it solve problems that weren't solved yet?

raincole1y ago

Techinically yes. And it's easy. You can probably do it with your PC's computational power.

The thing is that most math "problems" are not solved not becasue they're hard, but because they're not interesting enough to even be discovered by humans.

1 more reply

szundi1y ago· 1 in thread

Like it understands any of it

johnfn1y ago

Do you understand any of it?

1 more reply

dinobones1y ago· 1 in thread

I see DeepMind is still playing around with RL + search algorithms, except now it looks like they're using an LLM to generate state candidates.

I don't really find that this impressive. With enough compute you could just do n-of-10,000 LLM generations to "brute force" a difficult problem and you'll get there eventually.

richard___1y ago

Sigh. Just wrong

1 more reply

michael_nielsen1y ago

zone4111y ago

The best discussion is here: https://leanprover.zulipchat.com/#narrow/stream/219941-Machi...

Jun81y ago

Note that the 6th question is generally the hardest (“final boss”) and many top performers couldn’t solve it.

I don’t know what Lean is or how see AI’s proofs but an AI system that can explain such a question on par with the YouTuber above would be fantastic!

nopinsight1y ago

Once Gemini, the LLM, integrates with AlphaProof and AlphaGeometry 2, it might be able to reliably perform logical reasoning. If that's the case, software development might be revolutionized.

riku_iki1y ago

Example of proof from AlphaProof system: https://storage.googleapis.com/deepmind-media/DeepMind.com/B...

SJC_Hacker1y ago

Which means these problems are trivial to solve if you have a computer - you can simply check all possibilities. And is precisely the reason why calculators aren't allowed.

amarant1y ago

I wonder if this new model could be integrated with an LLM somehow? I get the feeling that combining those two powers would result in a fairly capable programmer.

Also perhaps a LLM could do the translation step that is currently manual?

nitrobeast1y ago

_heimdall1y ago

HL33tibCe71y ago

This is kind of an ideal use-case for AI, because we can say with absolute certainty whether their solution is correct, completely eliminating the problem of hallucination.

seydor1y ago

zhiQ1y ago

Coincidentally, I just posted about how well LLMs handle adding long strings of numbers: https://userfriendly.substack.com/p/discover-how-mistral-lar...

1024core1y ago

> The system was allowed unlimited time; for some problems it took up to three days. The students were allotted only 4.5 hours per exam.

I know speed is just a matter of engineering, but looks like we still have a ways to go. Hold the gong...

stonethrowaway1y ago

arnabgho1y ago

https://x.com/GoogleDeepMind/status/1816498082860667086

pnjunction1y ago

Brilliant and so encouraging!

>because of limitations in reasoning skills and training data

djaouen1y ago

Is it really such a smart thing to train a non-human "entity" to beat humans at math?

111010100011001y ago

Can anyone comment on how different the AI generated proofs are when compared to those of humans? Recent chess engines have had some 'different' ideas.

gowld1y ago

Why is it so hard to make an AI that can translate an informally specified math problem (and Geometry isn't even so informal) into a formal representation?

quantum_state1y ago

It’s as impressive as if not more than AI beating a chess master. But are we or should we be really impressed?

rowanG0771y ago

Is this just google blowing up their own asses or is this actually useable with some sane license?

atum471y ago

Oh, the title was changed to international math Olympiad. I was reading IMO as in my opinion, haha

fovc1y ago

https://news.ycombinator.com/item?id=39037512

mathinaly1y ago

How do they know their formalization of the informal problems into formal ones was correct?

__0x011y ago

Please could someone explain, very simply, what the training data was composed of?

m3kw91y ago

Is it one of those slowly slowly then suddenly things? I hope so

amelius1y ago

How long until this tech is integrated into compilers?

dmitrygr1y ago

> First, the problems were manually translated into formal mathematical language

That is more than half the work of solving them. Headline should read "AI solves the simple part of each IMA problem at silver medal level"

refulgentis1y ago

Goalposts at the moon, FUD at "but what if its obviously fake?".

Real, exact, quotes from the top comments at 1 PM EST.

"I want to believe that the computer found it, but I can't find anything to confirm."

"Curing cancer will require new ideas"

"Maybe they used 10% of all of GCP [Google compute]"

gerdesj1y ago

Why on earth did the "beastie" need the questions translating?

So it failed at the first step (comprehension) and hence I think we can request a better effort next time.

badrunaway1y ago

This will in a few months change everything forever. Exponential growth incoming soon from Deepmind systems.

thoiwer234231y ago

And yet it thinks 3.11 is greater than 3.9

(probably confused by version numbers)

mik091y ago

how long before it solves the last two problems?

gyudin1y ago

Haha, what a dumb tincan (c) somebody on Twitter right now :D

hendler1y ago