An OpenAI model has disproved a central conjecture in discrete geometry (opens in new tab)

(openai.com)

1429 pointstedsanders1mo ago1055 comments

https://x.com/wtgowers/status/2057175727271800912, https://xcancel.com/wtgowers/status/2057175727271800912

1055 comments

243 comments · 84 top-level

m-hodges1mo ago· 27 in thread

To the “LLMs just interpolate their training data” crowd:

Ayer, and in a different way early Wittgenstein, held that mathematical truths don’t report new facts about the world. Proofs unfold what is already implicit in axioms, definitions, symbols, and rules.

I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.

So either “recombining existing material” isn’t disqualifying, or a lot of Fields Medals need to be returned.

pseudocomposer1mo ago

I'd hope most functional adults understand that the Fields Medal and basically every other annual "prize" out there is awarded to both "recombinant" innovations and "new-dimensional thinking" innovations. Humans aren't going to come up with "new-dimensional" innovations in every field, every single year.

I'd say yes, LLMs "just" recombine things. I still don't think if you trained an LLM with every pre-Newton/Liebniz algebra/geometry/trig text available, it could create calculus. (I'm open to being proven wrong.) But stuff like this is exactly the type of innovation LLMs are great at, and that doesn't discount the need for humans to also be good at "recombinant" innovation. We still seem to be able to do a lot that they cannot in terms of synthesizing new ideas.

7 more replies

ykl1mo ago

I like to think of it as:

Imagine every bit of human knowledge as a discrete point within some large high dimensional space of knowledge. You can draw a big convex hull around every single point of human knowledge in a space. A LLM, being trained within this convex hull, can interpolate between any set of existing discrete points in this hull to arrive at a point which is new, but still inside of the hull. Then there are points completely outside of the hull; whether or not LLMs can reach these is IMO up for debate.

Reaching new points inside of the hull is still really useful! Many new discoveries and proofs are these new points inside of the hull; arguable _most_ useful new discoveries and proofs are these. They're things that we may not have found before, but you can arrive at by using what we already have as starting points. Many math proofs and Nobel Prize winning discoveries are these types of points. Many haven't been found yet simply because nobody has put the time or effort towards finding them; LLMs can potentially speed this up a lot.

Then there are the points completely outside of hull, which cannot be reached by extrapolation/interpolation from existing points and require genuine novel leaps. I think some candidate examples for these types of points are like, making the leap from Newtonian physics to general relativity. Demis Hassabis had a whole point about training an AI with a physics knowledge cutoff date before 1915, then showing it the orbit of Mercury and seeing if it can independently arrive at general relativity as an evaluation of whether or not something is AGI. I have my doubts that existing LLMs can make this type of leap. It’s also true that most _humans_ can’t make these leaps either; we call Einstein a genius because he alone made the leap to general relativity. But at least while most humans can’t make this type of leap, we have existence proofs that every once in a while one can; this remains to be seen with AI.

3 more replies

dvt1mo ago

> I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.

Most discoveries are indeed implied from axioms, but every now and then, new mathematics is (for lack of a better word) "created"—and you have people like Descartes, Newton, Leibniz, Gauss, Euler, Ramanujan, Galois, etc. that treat math more like an art than a science.

For example, many belive that to sovle the Riemann Hypothesis, we likely need some new kind of math. Imo, it's unlikely that an LLM will somehow invent it.

6 more replies

stego-tech1mo ago

As others have pointed out, both can be true:

* LLMs do just interpolate their training data, BUT-

* That can still yield useful "discoveries" in certain fields, absent the discovery of new mechanics that exist outside said training data

In the case of mathematics, LLMs are essentially just brute-forcing the glorified calculators they run on with pseudo-random data regurgitated along probabilities; in that regard, mathematics is a perfect field for them to be wielded against in solving problems!

As for organic chemistry, or biology, or any of the numerous fields where brand new discoveries continue happening and where mathematics alone does not guarantee predicted results (again, because we do not know what we do not know), LLMs are far less useful for new discoveries so much as eliminating potential combinations of existing data or surfacing overlooked ones for study. These aren't "new" discoveries so much as data humans missed for one reason or another - quack scientists, buried papers, or just sheer data volume overwhelming a limited populace of expertise.

For further evidence that math alone (and thus LLMs) don't produce guaranteed results for an experiment, go talk to physicists. They've been mathematically proving stuff for decades that they cannot demonstrably and repeatedly prove physically, and it's a real problem for continued advancement of the field.

2 more replies

midtake1mo ago

You have a good point about the human rate of mathematical discovery, but Ayer was an idiot and later Witt contradicted early Witt. For the "already implicit" claim to be true, mathematics would have to be a closed system. But it has already been proven that it is not. You can use math to escape math, hence the need for Zermelo-Frankel and a bunch of other axiomatic pins. The truth is that we don't really understand the full vastness of what would objectively be "math" and that it is possible that our perceived math is terribly wrong and a subset of a greater math. Whether that greater math has the same seemingly closed system properties is not something that can be known.

4 more replies

hammock1mo ago

Recombining existing material is exactly right, and in this case LLMs were uniquely positioned to make the connection quicker than any group of humans.

The proof relies on extremely deep algebraic number theory machinery applied to a combinatorial geometry problem.

Two humans expert enough in either of those totally separate domains would have to spend a LONG time teaching each other what they know before they would be able to come together on this solution.

2 more replies

nomel1mo ago

I feel this is the case whenever I "problem solve". I'm not really being creative, I'm pruning a graph of a conceptual space that already exists. The more possibilities I see, the easier it is to run more towards an optimal route between the nodes, but I didn't "create" those nodes or edges, they are just causal inevitabilities.

1 more reply

thechao1mo ago

You can build a census of all gen-2, degree-2 formal products of polynomial like terms. If you insist on instituting your own rewrite rules and identity tables, it is straightforward — maybe an 15 minutes of compute time — to perform a complete census of all of the algebraic structures that naturally emerge. Every even vaguely studied algebra that fits in the space is covered by the census (you've got to pick a broad enough set of rewrite- and identity- operations). There's even a couple of "unstudied" objects (just 2 of the billion or so objects); for instance:

    (uv)(vu) = (uu)(vv)

Shows up as a primitive structure, quite often.

If you switch to degree-3 or generator-3 then the coverage is, essentially, empty: mathematics has analyzed only a few of the hundreds (thousands? it's hard to enumerate) naturally occurring algebraic structures in that census.

sillysaurusx1mo ago

It’s easy to see that LLMs don’t merely recombine their training data. Claude can program in Arc, a mostly dead language. It can also make use of new language constructs. So either all programming language constructs are merely remixes of existing ideas, or LLMs are capable of working in domains where no training data exists.

3 more replies

mrandish1mo ago

I'm just hoping we're almost past this phase of needing to assess LLM capabilities against an arbitrary one dimensional yard stick labeled 'Not Human' on end and "Beyond Human' on the other.

It's irrelevant and pointless. Irrelevant not just in the sense that when Deep Blue finally beat Kasparov, it didn't change anything but in the sense some animals and machines have always been 'better' on some dimensions than humans. And it's pointless because there's never been just one yardstick and even if there was it's not one dimensional or even linear. Everyone has their own yardstick and the end points on each change over time.

Don't assume I'm handing "the win" to the AI supremacists either. LLMs can be very useful tools and will continue to dramatically improve but they'll never surpass humans on ALL the dimensions that some humans think are crucial. The supremacists are doomed to eternal frustration because there won't ever be a definitive list of quantifiable metrics, a metaphorical line in the sand, that an AI just has to jump over to finally be universally accepted as superior to humans in all ways that matter. That will never happen because what 'matters' is subjective.

awesome_dude1mo ago

There was a project long long ago where every piece of knowledge known was cross pollinated with every other piece of knowledge, creating a new and unique piece of knowledge, and it was intended to use that machine to invalidate the patent process - obviously everything had therefore been invented.

But that's not how new frontiers are conquered - there's a great deal of existing knowledge that is leveraged upon to get us into a position where we think we can succeed, yes, but there's also the recognition that there is knowledge we don't yet have that needs to be acquired in order for us to truly succeed.

THAT is where we (as humans) have excelled - we've taken natural processes, discovered their attributes and properties, and then understood how they can be applied to other domains.

Take fire, for example, it was in nature for billions of years before we as a species understood that it needed air, fuel, and heat in order for it to exist at all, and we then leveraged that knowledge into controlling fire - creating, growing, reducing, destroying it.

LLMs have ZERO ability (at this moment) to interact with, and discover on their own, those facts, nor does it appear to know how to leverage them.

edit: I am going to go further

We have only in the last couple of hundred years realised how to see things that are smaller than what our eye's can naturally see - we've used "glass" to see bacteria, and spores, and we've realised that we can use electrons to see even smaller

We're also realising that MUCH smaller things exist - atoms, and things that compose atoms, and things that compose things that compose atoms

That much is derived from previous knowledge

What isn't, and it's what LLMs cannot create - is tools by which we can detect or see these incredible small things

austinl1mo ago

I'm not sure how feasible this is, but I love the thought experiment of limiting a training set to a certain time period, then seeing how much hinting it takes for the model to discover things we already know.

E.g. training on physics knowledge prior to 1915, then attempting to get from classical mechanics to general relativity.

1 more reply

libraryofbabel1mo ago

This is a good point, and there’s some deep philosophical questions there about the extent to which mathematics is invented or discovered. I personally hedge: it’s a bit of both.

That said. I think it’s worth saying that “LLMs just interpolate their training data” is usually framed as a rhetorical statement motivated by emotion and the speaker’s hostility to LLMs. What they usually mean is some stronger version, which is “LLMs are just stochastically spouting stuff from their training data without having any internal model of concepts or meaning or logic.” I think that idea was already refuted by LLMs getting quite good at mathematics about a year ago (Gold on the IMO), combined with the mechanistic interpretatabilty research that was actually able to point to small sections of the network that model higher concepts, counting, etc. LLMs actually proving and disproving novel mathematical results is just the final nail in the coffin. At this point I’m not even sure how to engage with people who still deny all this. The debate has moved on and it’s not even interesting anymore.

So yes, I agree with you, and I’m even happy to say that what I say and do in life myself is in some broad sense and interpolation of the sum of my experiences and my genetic legacy. What else would it be? Creativity is maybe just fortunate remixing of existing ideas and experiences and skills with a bit of randomness and good luck thrown in (“Great artists steal”, and all that.) But that’s not usually what people mean when they say similar-sounding things about LLMs.

smaudet1mo ago

If anything, this is more illustration of how llms are not useful to us...

They will do their own thing, don't need us. In fact, we will be in the way...

We can choose to study them and their output, but they don't make us better mathematicians...

2 more replies

anon2911mo ago

To every proof, there is a corresponding program. This makes proofs expressible in a language made up of finite grammatical rules and terminal symbols. Knowledge accessible by proof is thus always a form of interpolating data whether made up by an AI model or a human mathematician. The people dismissing AI because of claims that it can only interpolate data don't have a good understanding of what it means to know something. Now of course not everything can be known via proof but for the sorts of things that we want to know via a computer this is a fine compromise.

paulddraper1mo ago

"LLMs just interpolate their training data"

Cracks me up.

What exactly do we think that human brains do?

9 more replies

zerr1mo ago

There is a creational aspect in math - definitions and rules are created.

1 more reply

mejutoco1mo ago

It is the old discovery vs invention in mathematics.

Or of you prefer philosophy: Parmenides (nothing changes) vs Heraclitus (you cannot bath twice in the same river aka everything changes all the time).

Postmodernism also claimed that everything has been done already. IMO these 2 are points of view that one can adopt, not truths based on fact. So the distinction is a matter of taste or perspective, not of truth, IMO.

oh_my_goodness1mo ago

We know that LLMS "just interpolate" their training data. Maybe there's a mystery about what "just interpolate" means when the data set gets enormous. But we know what LLMs do.

adam_arthur1mo ago

Pretty much everything that appears novel in life is derivative of other works or concepts.

You can watch a rock roll down a hill and derive the concept for the wheel.

Seems pretty self evident to me

jonahx1mo ago

> held that mathematical truths don’t report new facts about the world

I'm not as familiar with the early work, but later Wittgenstein held this belief too.

chr15m1mo ago

Side note: don't underestimate how much literal, physical time and energy "unfold" implies. Proofs occur on physical substrates.

block_dagger1mo ago

This is the second reference to Wittgenstein I’ve seen today in totally different contexts. Reminded me how much I vibe with his Tractatus.

cyanydeez1mo ago

I think someone should be talking to Godel.

goldylochness1mo ago

this is an excellent point, new ground isn't necessarily novel, it's a rearrangement of existing pieces

BoredPositron1mo ago

Post hoc ergo propter hoc

voooduuuuu1mo ago

I think you are conflating composition and prediction. LLMs don't compose higher abstractions from the "axioms, symbols and rules", they simply predict the next token, like a really large spinning wheel.

6 more replies

vatsachak1mo ago· 18 in thread

As I have stated before, AI will win a fields medal before it can manage a McDonald's

A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.

LLMs are just the beginning, we'll see more specialized math AI resembling StockFish soon.

trostaft1mo ago

> A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.

However, this was not verified in Lean. This was purely plain language in and out. I think, in many ways, this is a quite exciting demonstration of exactly the opposite of the point you're making. Verification comes in when you want to offload checking proofs to computers as well. As it stands, this proof was hand-verified by a group of mathematicians in the field.

4 more replies

Terr_1mo ago

> manage a McDonald's

Dystopia vibes from the fictional "Manna" management system [0] used at a hamburger franchise, which involved a lot of "reverse centaur" automation.

> At any given moment Manna had a list of things that it needed to do. There were orders coming in from the cash registers, so Manna directed employees to prepare those meals. There were also toilets to be scrubbed on a regular basis, floors to mop, tables to wipe, sidewalks to sweep, buns to defrost, inventory to rotate, windows to wash and so on. Manna kept track of the hundreds of tasks that needed to get done, and assigned each task to an employee one at a time. [...]

> At the end of the shift Manna always said the same thing. “You are done for today. Thank you for your help.” Then you took off your headset and put it back on the rack to recharge. The first few minutes off the headset were always disorienting — there had been this voice in your head telling you exactly what to do in minute detail for six or eight hours. You had to turn your brain back on to get out of the restaurant.

[0] https://en.wikipedia.org/wiki/Manna_(novel)

2 more replies

Lerc1mo ago

I disagree. It will be able to perform work deserving if a fields medal before it is capable of running a McDonalds. I think it will be running a McDonalds well before either of those things happen, and a fields medal long after both have happened.

3 more replies

evenhash1mo ago

The proof is not written in Lean, though. It’s written in English and requires validation by human experts to confirm that it’s not gibberish.

1 more reply

jeremyjh1mo ago

Stockfish did not teach itself to play chess. You are probably thinking of Leela Chess Zero - an open re-implementation of AlphaZero - both were given nothing but the rules of chess and a board and played millions of games against themselves until they were the strongest engine available at the time.

Stockfish's neural net evaluation model was trained on millions of its positions with its own original algorithmic evaluation function (entirely developed by humans) and search tree. The result was a much smaller model than Leela's that requires little computation (not even a GPU), paired with its already extremely efficient search/pruning algorithms that made it stronger than Leela in competitive play. Leela's evaluation function is much stronger (at one ply it has an ELO of around 2300, Stockfish is probably closer to 1800), but it requires vastly more resources and those are always bounded in a match.

Humans haven't learned as much new information about chess from Stockfish as we have from Leela.

1 more reply

soupspaces1mo ago

Lee Sedol, Move 37 https://www.reddit.com/r/singularity/comments/1l0z5yk/the_mo... Edit: I wasn't necessarily disagreeing. But on second thought the chessboard in this math analogy is being built, not just played in. This Hardy quote comes to mind https://www.goodreads.com/quotes/902543-it-proof-by-contradi...

1 more reply

energy1231mo ago

The issue with this prediction is the gulf between problem-solving using known tools, versus creating new concepts for problems where existing tools aren't enough.

All AI proofs so far, including this one, are using existing tools in new ways, rather than inventing new tools. This is not surprising if you know how these models are trained. These existing tools are in distribution. New tools are not.

Problems worth of a Fields Medal likely require new tools to be invented. Thus it is not clear whether progress within the confines of the current paradigm is enough.

We could get this weird spiky situation where the AI is insanely superhuman at all problem solving, but completely incapable of coming up with a single new tool. It discovers everything there is to discover, subject to existing axioms and concepts.

Timothy Gowers gives some commentary on this in the attached PDF.

auggierose1mo ago

> A difficult part was constructing a chess board on which to play math

We have that chess board for quite a while now, over 40 years. And no, there is nothing special about Lean here, it is just herd mentality. Also, we don't know how much training with Lean helped this particular model.

KalMann1mo ago

I think your analogy is good but I don't believe modern LLMs use Lean or any lean-like structure in their proofs. At least recent open source ones like DeepSeek can do advanced math without it (maybe the most cutting edge ones are doing it I can't say).

1 more reply

forinti1mo ago

AI is already too old for that.

sigmoid101mo ago

Managing a McDonalds is a question of integration and modalities at this point. I don't think anyone still doubts that these models lack the reasoning capability or world knowledge needed for the job. So it's less of a fundamental technical problem and more of a process engineering issue.

3 more replies

volkercraig1mo ago

> we'll see more specialized math AI resembling StockFish soon

Heuristically weighted directed graphs? Wow amazing I'm sure nobody has done that before.

1 more reply

brikym1mo ago

Hey ChatGPT, if a person spills hot McCoffee on themselves who is at fault?

1 more reply

fapjacks1mo ago

I dunno. Is AI less than forty years old?

whimsicalism1mo ago

the only thing keeping the mcdonalds from happening will be political, likewise the same with fields medal

ori_b1mo ago

We're automating art and science so that we can flip burgers. This future sucks.

2 more replies

segmondy1mo ago

our local AI models are already capable of running McDonalds.

2 more replies

dyauspitr1mo ago

Nonsense. Have you been watching the figure live stream? Or the Unitree video from yesterday with real time novel action generation? We’re less than a year away. If you can cook a burger, assemble a sandwich and clean up surfaces you’re all of the way there.

1 more reply

trostaft1mo ago· 15 in thread

Speaking as a postdoc in math, I must say that this is rather exciting. This is outside of my field, but the companion remarks document is quite digestible. It appears as though the proof here fairly inspired by results in literature, but the tweaks are non-trivial. Or, at least to me, they appear to be substantial to where I would consider the entire publication novel and exciting.

Many of my colleagues and I have been experimenting with LLMs in our research process. I've had pretty great success, though fairly rarely do they solve my entire research question outright like this. Usually, I end up with a back and forth process of refinements and questions on my end until eventually the idea comes apparent. Not unlike my traditional research refinement process, just better. Of course, I don't have access to the model they're using =) .

Nevertheless, one thing that struck me in this writeup, was the lack of attribution in the quoted final response from the model. In a field like math, where most research is posted publicly and is available, attribution of prior results is both social credit and how we find/build abstractions and concentrate attention. The human-edited paper naturally contains this. I dug through the chain-of-thought publication and did actually find (a few of) them. If people working on these LLMs are reading, it's very important to me that these are contained in the actual model output.

One more note: the comments on articles like these on HN and otherwise are usually pretty negative / downcast. There's great reason for that, what with how these companies market themselves and how proponents of the technology conduct themselves on social media. Moreover, I personally cannot feel anything other than disgust seeing these models displace talented creatives whose work they're trained on (often to the detriment of quality). But, for scientists, I find that these tools address the problem of the exploding complexity barrier in the frontier. Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced. I cannot help but be very optimistic about the ambition mathematicians of this era will be able to scale to. There still remain lots of problems in current era tools and their usage though.

xbmcuser1mo ago

This is the main thing that I keep harping about that human knowledge is too vast today for a person or even a group of people and llm will change that many discoveries that require serendipity in the past will be more likely than ever

3 more replies

energy1231mo ago

Terence Tao gave a recent talk about this issue (lack of attribution). He called it the decoupling of implicit and explicit goals. AI is only good at solving the explicit goals for now, and humans don't have the bandwidth or the institutions to know how to integrate AI into the field.

https://youtu.be/Uc2zt198U_U?si=OkwO3xT8-zhSABwh

1 more reply

inciampati1mo ago

I am also using these models to accelerate scientific discovery. Yes, they are making all the difference at the frontier. At least, they feel they are. The messy thing is that we still need to communicate with each other and that's not getting dramatically faster or better. As you note the models need to be built so they do more work to participate in our communication economy. Or we will do so much, alone, to get nowhere fast because so much of our behavior is still bound up in old (good, tested, but clunky) ways of building shared knowledge.

1 more reply

ontouchstart1mo ago

> I dug through the chain-of-thought publication and did actually find (a few of) them. If people working on these LLMs are reading, it's very important to me that these are contained in the actual model output.

This is a very important point, especially when the output is from a non-deterministic random walk with some unknown probability distribution.

JohnHammersley1mo ago

Yes, I share your optimism overall, although I think it is raising a question of what the future role of the researcher is (much like the current debate on developer roles).

I attended a conference on AI for maths and open science a few weeks ago, and was struck by just how many examples of AI-supported solutions there already are. Virtually every speaker had an example of either their own use of (often the frontier) AI models in solving a problem that was previously too hard (for various definitions of hard).

I wrote up a few notes [1], and most of the speaker videos are available via the conference website [2].

[1] https://scholarlyfutures.substack.com/p/ai-and-the-practical...

[2] https://www.newton.ac.uk/event/ooew11/

shalmanese1mo ago

> But, for scientists, I find that these tools address the problem of the exploding complexity barrier in the frontier. Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced.

AI is going to both help and hinder this process though. At the end of the day, mathematics is mostly a social process at this point. The goal is not raw number of theorems proven, it’s how proving theorems affects the working operational models of mathematicians. Only a rare few new theorems in mathematics nowadays have direct real world applicability.

If AI produced legitimate theoretical breakthroughs at a pace mathematicians are unable to absorb, then the impact will be neutral to negative.

3 more replies

isotypic1mo ago

I cannot quite share your enthusiasm. The clearest analogy that I can think of to try to explain why I feel this way is that it seems there will eventually be a phantom textbook of all of mathematics contained in the weights of an LLM; every definition, every proof, etc; and the role of a mathematician is going to be reduced towards reading certain parts of this phantom textbook (read: prompting an LLM to generate a proof or explore some problem) and sharing the resulting text with others, which of course anybody else could have found if they simply also knew the right point of the textbook.

To be blunt, this seems incredibly uninteresting to me. I enjoy learning mathematics, sure, but I just don't find much inherent meaning in reading a textbook or a paper. The meaning comes from the taking those ideas and applying them to my own problems, be it a direct proof of a conjecture or coming up with the right framework or tools for those conjectures. But, of course, in this future, those proofs and frameworks are already in the textbook. So what's the point? If someone cared about these answers in the first place, they probably could have found the right prompt to extract it from this phantom textbook anyways.

You could argue for there being work still like marginal improvements and applying the returned proof to other scenarios as happened in this case, but as above, what is really there to do if this is already in the phantom textbook somewhere and you just need to prompt better? The mathematicians in this case added to the exposition of the proof, but why wouldn't the phantom textbook already have good enough exposition in the first place?

I think my complete dismissal of the value of things like extending the proofs from an LLM or improving exposition is too strong -- there is value in both of them, and likely will always be -- but it would still represent a sharp change in what a mathematician does that I don't think I am excited for. I also don't think this phantom textbook is contained even in the weights of whatever internal model was used here just yet (especially since as some of the mathematicians in the article pointed out, a disproof here did not need to build any new grand theories), but it really does seem to me it eventually will be, and I can't help but find the crawl towards that point somewhat discouraging.

5 more replies

qnleigh1mo ago

Can you describe what the reaction to these results has been like in your department? Obviously many people are excited, but what else? How do grad students feel about this? Are any professors getting worried about becoming obsolete?

2 more replies

colordrops1mo ago

Maybe I'm misunderstanding how these models work, but isn't it more the responsibility of the harness and its prompts rather than the model itself to make sure that a result is generated with explicit sources?

2 more replies

teiferer1mo ago

> Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced.

And by opening the door to LLM-generated results, you'll see greater and greater amounts without any hope of ever navigating this field again without machine help.

It's a little like a software project which more and more gets extended by a AI agents with less and less review by human software engineers and in the end the complexity and spaghetti design are so incomprehensible by humans that the maintenance requires an AI agent. The risk is that math as a whole (the field itself) will experience that effect.

3 more replies

bandrami1mo ago

I am curious if LLMs are better at some kinds of problems than others. IIRC this and another big recent one were cases of the LLM producing a counterexample to a conjecture.

1 more reply

julianozen1mo ago

Nice response to read

Mikhail_K1mo ago

> But, for scientists, I find that these tools address the problem of the > exploding complexity barrier in the frontier.

They do the opposite by locking the results the produce within the slop presentation that needs more AI to comprehend.

doctorpangloss1mo ago

> Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced. I cannot help but be very optimistic about the ambition mathematicians of this era will be able to scale to. There still remain lots of problems in current era tools and their usage though.

Always, always always, the problem with research and development is leadership, not insufficient supportive technology. It is a political problem, there is absolutely, positively no shortage of technologies to support research. Your optimism is totally misplaced. The NSF funding cuts have negatively impacted math more than AI has benefitted it. And guess who supports the administration that cut NSF funding? The people who ousted the PhDs from OpenAI.

2 more replies

umanwizard1mo ago

Why would it excite you, rather than terrifying you? The better LLMs get at math, the closer the expertise you spent your whole life building is to being worthless.

Along with all the rest of what humans find meaningful and fulfilling.

11 more replies

cpard1mo ago· 12 in thread

The proof brings unexpected, sophisticated ideas from algebraic number theory to bear on an elementary geometric question.

The more I read about these achievements the more I get a feeling that a lot of the power of these models comes from having prior knowledge on every possible field and having zero problems transferring to new domains.

To me the potential beauty of this is that these tools might help us break through the increasing super specialization that humans in science have to go through today. Which in one hand is important on the other hand does limit the person in terms of the tooling and inspiration it has access to.

rjzzleep1mo ago

What you describe here has always been true in all sciences, but also in medicine. But both modern engineering and education runs completely counter to this. You are encouraged to stay in your niche and never look out. People with vast interested are filtered out by hiring managers.

So the crossdomain pollination that used to exist in scientists is not only not encouraged. It's also actively punished by society.

3 more replies

freakynit1mo ago

Many breakthroughs come from taking an idea from one field and applying it somewhere else. But, almost every serious field is now so deep/complex/huge that humans rarely get the time, or even have enough practically useable memory, to understand and correlate multiple unrelated areas properly.

And this is where machines, such as these reasoning LLMs, can help. Because they can remember patterns across many domains and try absolutely bonker weird connections and ideas.

We, the humans still have to verify the work (at least as of now). But, the "maybe this tool, or idea, or trick, from that completely unrelated field applies here" reasoning/experimentation could become much easier.

I have always said this and will say it again: reasoning is just experimentation with a feedback loop and continuous refinement.

doubledamio1mo ago

I’ve always been skeptical about the role of LLMs in mathematics, but this is the first time I’ve seen this argument, and I actually find it very compelling. Maybe LLMs will help us develop more horizontal understanding of the field.

1 more reply

keyle1mo ago

I think you're on point, and you've explained it very well.

As we're becoming hyper specialised, they become an invaluable tool to merge the horizon in, so to speak.

1 more reply

dhosek1mo ago

One of the challenges I had in graduate mathematics was just trying to keep all the concepts in my brain. It doesn’t help that you end up with things like homomorphism and homeomorphism tangling one’s brain thanks to their superficial similarities. Heck, just keeping track of basic theorems and definitions is a challenge.

1 more reply

margorczynski1mo ago

Yep. The thing is people (maybe because of our limited scope) just focus on the depth and not the breadth. Because this is a general purpose model - it also has PhD+ knowledge in Physics, Biology, History, etc.

I think we still don't really comprehend how much can be achieved by a single "mind" that has internalized so much knowledge from so many areas.

1 more reply

efavdb1mo ago

It’s as if the body of human knowledge is our I’ve mind. It used to be expensive to access that, but no more.

Cool thing is now when someone contributes something to the hive mind, it can instantly be applied to any other problem people are working on.

mxfh1mo ago

That's the whole point of LLM, connecting all the missing dots no single human could possible keep in working knowledge, even just for a subfield of mathematics alone. The era of polymaths is over for a reason, so we build a new one to tackle that. If LLMs can build on top of that once all remaining ones are found or if this stalls is yet to be proven, but humans stalled out there too.

nashadelic1mo ago

There are so many research papers; just finding a solution to, say, a bio problem in a deep math paper would be a gold mine of opportunity. Very exciting times!

psb51mo ago

Check out Ashby's Law of Requisite Variety

make31mo ago

To me, AI feels like the morbidity of Star Trek teleportation, where it's actually copying the person at to the other end and zapping the original one out of existence. The original human never benefits from the fast transportation.

Similarly, we're creating tools to improve knowledge, but we're progressively zapping the human out of the equation. Knowledge is created for something, but it's unclear if very soon humans will be able to understand it, or really benefit from it, except billionaires, etc.

It's too bad that we're not improving humans nearly as fast as we're replacing ourselves.

1 more reply

nicman231mo ago

like the research team that rediscovered calculus for treating diabetics

raincole1mo ago· 12 in thread

I like how everyone laughed when OpenAI said their models will have "PhD-Level Intelligence" and now the goalpost has been moved to if AI can create new math (i.e., not PhD-Level, but Leibniz/Euler/Galois level.)

bananaflag1mo ago

As a mathematician, new, conceptual math is when I'll become interested in reading LLM output.

I appreciate very much the work done so far, but this sort of asymptotic/quantitative result didn't interest me much even when it was done by humans.

(This is not snobbery, just a personal preference.)

2 more replies

golol1mo ago

No it is not Leibniz/Euler/Galois. More like writing good papers that contribute to the broader understanding of a theory. I think if one evaluated a mathematicians research output and it consisted of mostly the kinds of problems AI has solved so far, it would give the impression that this person is somehow very good at picking accessible problems to target, but has not made a larger impact on the field.

2 more replies

InsideOutSanta1mo ago

What were you imagining when OpenAI said that their models would have "PhD-Level Intelligence"? Were you imagining that there were specific tasks they could do that were on par with what a human with a PhD could do? Because by that definition, many computer tools have "PhD-Level Intelligence". By that definition, Wolfram Alpha has "PhD-Level Intelligence".

What I assumed they were saying is that their LLMs would be as intelligent as a human with a PhD across all, or at least most, knowledge tasks, and they clearly are not.

zamadatix1mo ago

My only complaint is the claims always start spreading 6-12 months before the delivery. A little patience goes a long way in what's possible with AI and we all just have to wait and see what parts actually grow this next cycle or not. Guessing at it based on trend lines only leads to people getting excited when it matches their particular guess and ignoring it when it doesn't.

2 more replies

necovek1mo ago

PhDs used to mean publishing a novel mathematical result: when has that changed?

4 more replies

turzmo1mo ago

Not denying that these advances are impressive, but it is important to consider that this is a cherry-picked result. This doesn’t mean that AI can now be expected to do problems of similar or lower difficulty, but that it happened to work well on one problem. What you won’t see is how many others they had to try to get this result.

1 more reply

xigoi1mo ago

It would be more impressive if it wasn’t behind the closed doors of a rich company. For all we know, they could’ve paid some mathematicians to work on the problem and pretend that their results are from ChatGPT.

dawnerd1mo ago

Yet it still codes like a junior developer that memorized all of stack overflow.

7 more replies

staticman21mo ago

What's laughable is an OpenAI employee invented the term "PHD level intelligence" and you think that " PHD Level intelligence" is a real term that describes a real thing and you are repeating it here.

2 more replies

zeofig1mo ago

I still laugh.

1 more reply

Valakas_1mo ago

And the goalposts will keep getting moved all the way to the singularity. And then those people will/would say "Oops. I was wrong."

xgulfie1mo ago

large language models do not have pigeon-level intelligence. They can't even feed themselves.

0x5FC31mo ago· 10 in thread

Is there a reason why we only hear of Erdos problems being solved? I would imagine there are a myriad of other unsolved problems in math, but every single ChatGPT "breakthrough in math" I come across on r/singularity and r/accelerate are Erdos problems.

jltsiren1mo ago

Erdős problems form a substantial fraction of all mathematical problems that have been explicitly stated but not solved; are sufficiently famous that people care about them; and are sufficiently uninteresting that people have not spent that much effort trying to solve them.

Solving problems people have already stated is a niche activity in mathematical research. More often, people study something they find interesting, try to frame it in a way that can be solved with the tools they have, and then try to come up with a solution. And in the ideal case, both the framing and the solution will be interesting on their own.

2 more replies

bananaflag1mo ago

Erdos problems are easier to state, thus they make a great benchmark for the first year of AI mathematics.

tonfa1mo ago

Afaik this is because there is a community and database around them.

1 more reply

empath751mo ago

It's a large set of problems that are both interesting and difficult, but not seen as foundational enough or important enough that they have already had sustained attention on them by mathematicians for decades or centuries, and so they might actually be solvable by an LLM.

1 more reply

TrackerFF1mo ago

As others have written, Erdős was a lifelong curator of mathematical problems, from high-school level problems to the types that will land you a Fields medal. Like the Collatz conjecture.

Most new math problems appear in other papers, doctoral dissertations, etc. Usually you'll find them in the "future work" / "future research" section.

So obviously in order to present and formalize these problems, you either need the author(s) to do it, or some reader. At this level of math, there are many extremely niche fields, where the papers might only be read by a small amount of people.

In short, it is a visibility problem.

But, I figure, there's some potential use in AI models to extract and present these problems, which would make them available to a larger audience.

That is exactly what Erdős did. His life revolved around math, and seeking mathematical questions.

throw-the-towel1mo ago

They're just famous because Erdos was a great mathematician, kinda like the Hilbert problems a century earlier.

famouswaffles1mo ago

It's not just Erdos problems - https://news.ycombinator.com/item?id=48213189

odie55331mo ago

I was promised a cure for cancer, but all I got was this disproof of an Erdos problem.

cold_harbor1mo ago

Erdos problems are well-posed for AI — elementary statements, exact counterexample targets, extensively catalogued. selection bias: these are exactly the problems AI can actually search

xyzsparetimexyz1mo ago

The models can't actually so good work on practical problems so openai tasks them on stuff nobody cares about

aurareturn1mo ago· 7 in thread

One thing seems for certain is that OpenAI models hold a distinct lead in academics over Anthropic and Google models.

For those in academics, is OpenAI the vendor of choice?

Jcampuzano21mo ago

OpenAI specifically targeted Academia a lot and gave out a lot of free/unlimited usage to top academics and universities/researchers.

They also offer grants you can apply for as a researcher. I'm sure other labs may have this too but I believe OpenAI was first to this.

tracerbulletx1mo ago

Hasn't AlphaFold been used to make real discoveries for a few years now?

1 more reply

karmasimida1mo ago

I think the mathematicians on X are all using GPT 5.5 Pro

1 more reply

bayindirh1mo ago

From my limited testing, Gemini can dig out hard to find information given you detail your prompt enough.

Given that Google is the "web indexing company", finding hard to find things is natural for their models, and this is the only way I need these models for.

If I can't find it for a week digging the internet, I give it a colossal prompt, and it digs out what I'm looking for.

1 more reply

logicchains1mo ago

OpenAI models seem to have been trained on a lot of auto-generated theorem proving data; GPT 5.5 is really good at writing Lean.

FloorEgg1mo ago

Gemini seems better trained for learning and I think Google has made a more deliberate effort to optimize for pedagoical best practices. (E.g. tutoring, formative feedback, cognitive load optimization)

As far as academic research is concerned (e.g. this threads topic), I can't say.

4 more replies

causal1mo ago

A simpler explanation is that more people are using ChatGPT

libraryofbabel1mo ago· 7 in thread

This HN thread depressed me. I’m still thinking about why.

Look past the press-releasey gushing from OpenAI and there are all sorts of interesting and subtle questions here about the role for LLMs in mathematical research. I urge folks to click through to the accompanying comments from mathematicians published alongside the result. There is a really interesting discussion going on. I particularly recommend Tim Gowers’ remarks. This is really interesting stuff!

Yet the comments are just a battleground of people rehearsing the same tired arguments about LLMs from 2023, refutations of those arguments, angry counters, etc.

Does it make anyone else sad that the battle lines seem to have been drawn 3 years ago and we just seem to have the same fights over and over?

I wonder if we’ll still be doing this two years hence.

getnormality1mo ago

Yes, this and every internet forum will still be doing this two years hence. Your life will be better if you take to heart this famous passage from Nietzsche:

I do not want to wage war against what is ugly. I do not want to accuse; I do not even want to accuse those who accuse. Looking away shall be my only negation.

1 more reply

jryan491mo ago

People are afraid for their livelihood. What do you expect?

3 more replies

doginasuit1mo ago

I find it understandable, it is common to evaluate human intelligence vs AI as a zero-sum competition, because that is how employers typically understand it and LM providers market it. AI proving itself moves the needle in an uncomfortable direction for all of us without very robust job security.

> I wonder if we’ll still be doing this two years hence.

It is going to take some time for people to recognize that AI has a very different set of competencies that compliments human intelligence rather well. It is unlikely to eclipse human intelligence at scale, and the companies betting on that will fall behind. That is when the conversation will start to shift.

3 more replies

godelski1mo ago

I think part of it is that one side throws rocks and so it never even matters was is in the article. It becomes a battle if the article is good or the article is shit.

Yes, I'm tired too. I want you have real discussions about these things. But the problem is everyone believes their reality is real and anyone's reality that disagrees is fake. It just escalates. I take long breaks from HN because I realize I just come to the forums and end up being angry. Why do we do this to ourselves? The reality is that at a core level we usually want the same things.

1 more reply

RedCinnabar1mo ago

What I don’t understand is why people dismiss this kind of progress with false claims. Especially when discussing programming, people start to act irrational using arguments from back in 2022.

I think that you can easily address your concerns about this new technology (since we all are concerned about the future) but at the same time acknowledge how revolutionary it is.

scosman1mo ago

We won’t be doing it in 2 years. By then my side will have won!

ex-aws-dude1mo ago

Lets just be real its because a lot of programmer's ego is built on intelligence/being a coding wizard and this threatens that ego

If suddenly anyone can code we're not that special anymore.

throw-the-towel1mo ago· 6 in thread

See the longstanding debate on whether new math is "invented" or "discovered". Most mathematicians I knew thought it's discovered.

amelius1mo ago

This is like saying a sculpture always existed, the sculptor just had to remove the superfluous material.

Or like a musical octave has only 12 semitones, so all music is just a selection from a finite set that already existed.

Sure the insane computation we're throwing at this changes our perspective, but still there is an important distinction.

5 more replies

skybrian1mo ago

Any design already exists as a possibility, so it could be said to be both invented and discovered, depending on how you look at it.

3 more replies

soupspaces1mo ago

Regardless of which, both Newton and Leibniz imprint in their findings a 'voice' and understanding different from each other and that of an LLM (for now?)

protoplancton1mo ago

One can argue that mathematical facts are discovered, but the tools that allow us to find, express them and prove them, are mostly invented. This goes up to the axioms, that we can deliberately choose and craft.

ASalazarMX1mo ago

Math is an abstraction of reality, it had to be invented, so more inventions or discoveries could be made within it.

2 more replies

atmosx1mo ago

...long standing indeed. It can be traced back to Plato's works.

1 more reply

lesostep1mo ago· 5 in thread

I am cautious about AI "discoveries" after Mythos paper.

What was the process of a writing a paper? Was the question asked by a mathematician? Was the paper right from a get-go or was there someone who pointed out mistakes?

How much attempts were made before solution was found?

I will eat my words if an AI oneshotted that one without any external help, but for know I am left wandering whether it's a new way to attribute discoveries to companies instead of people who put the work in

andy12_1mo ago

> Was the question asked by a mathematician?

As per the report, the prompt used to solve the problem is AI-written and the solution was initially graded by an AI grading pipeline. They don't say this explicitly, but it seems like OpenAI has an automatic pipeline where they prompt models for solutions to famous math problems (which wouldn't be unexpected given how flashy a solution to a famous math problem looks)

> Was the paper right from a get-go or was there someone who pointed out mistakes?

Also as per the report, the output of the model isn't really a "paper"; it's a very terse 2 page solution which is apparently correct. The paper was later written based on this solution to make it more presentable.

> How much attempts were made before solution was found?

Given that this appears to be from an automated pipeline, I would say that it had many attempts. But either way, the blogpost says that with enough test-time compute, the model finds this same solution 50% of the time.

[1] https://cdn.openai.com/pdf/74c24085-19b0-4534-9c90-465b8e29a...

0xDEAFBEAD1mo ago

>I am cautious about AI "discoveries" after Mythos paper.

Can you be more specific? I'm still under the impression that Mythos was a huge deal:

https://xcancel.com/hlntnr/status/2052479493801975987

1 more reply

rithdmc1mo ago

> I will eat my words if an AI oneshotted that one without any external help

like having a colleague peer review your paper, or bouncing ideas off a mentor before you write them down?

I agree there's a lot of AI marketing BS at the moment, but revising approaches based on feedback is a good thing.

1 more reply

crnkofe1mo ago

I'm also wondering about the process. What was the prompt, what they fed into the model, what it was trained on, etc. The article reads like a marketing post.

Nevertheless new maths is exciting and might lead to what I find slightly more interesting - new physics.

utopiah1mo ago

No need to be cautious.

If a for-profit (because... you know, OpenAI isn't at all what it initially was) huge corporation (again, not a cute startup trying to help humanity) publish anything it's a piece of marketing. Every single word a corporation say is marketing.

So... that's also that, a piece of marketing to sell more of whatever their potential client can buy. It's not a piece of research. It's an ad. That's it.

lubujackson1mo ago· 5 in thread

For anyone using LLMs heavily for coding, this shouldn't be too surprising. It was just a matter of time.

Mathematicians make new discoveries by building and applying mathematical tools in new ways. It is tons of iterative work, following hunches and exploring connections. While true that LLMs can't truly "make discoveries" since they have no sense of what that would mean, they can Monte Carlo every mathematical tool at a narrow objective and see what sticks, then build on that or combine improvements.

Reading the article, that seems exactly how the discovery was made, an LLM used a "surprising connection" to go beyond the expected result. But the result has no meaning without the human intent behind the objective, human understanding to value the new pathway the AI used (more valuable than the result itself, by far) and the mathematical language (built by humans) to explore the concept.

daishi551mo ago

> the result has no meaning without the human intent behind the objective, human understanding to value the new pathway the AI used (more valuable than the result itself, by far) and the mathematical language (built by humans) to explore the concept.

Isn't this just anthropocentrism? Why is understanding only valid if a human does it? Why is knowledge only for humans? If another species resolved the contradictions between gravity and quantum mechanics, does that not have meaning unless they explain it to us and we understand it?

6 more replies

cubefox1mo ago

There is a long and interesting recent essay on that topic by a mathematician: https://davidbessis.substack.com/p/the-fall-of-the-theorem-e...

4 more replies

kamaal1mo ago

>>But the result has no meaning without the human intent behind the objective, human understanding to value the new pathway the AI used (more valuable than the result itself, by far) and the mathematical language (built by humans) to explore the concept.

Future of code is pretty much a bunch of guys shepherding a bunch of agents to get them to your goal.

I don't see how math might not go that way as well.

anon2911mo ago

It is not only unsurprising ; it was always expected. There is no difference between programs and proofs. They are the same thing

mikert891mo ago

for now the LLMs will build off human understanding, eventually we will be left behind

Jeff_Brown1mo ago· 5 in thread

Can anyone find (or draw) a picture of the construction?

gibspaulding1mo ago

This only a proof that a field with more connections is possible, not what it looks like.

I’m very out of my depth, but the structure of the proof seems to follow a pattern similar to a proof by contradiction. Where you’d say for example “assume for the sake of contradiction that the previously known limit is the highest possible” then prove that if that statement is true you get some impossible result.

ninjha1mo ago

They only proved that one exists; computing the actual construction is non-obvious (the naive way to construct it is computationally infeasible).

pradn1mo ago

They have a "before" picture but not an "after"!

krackers1mo ago

See https://news.ycombinator.com/item?id=48233034

paulddraper1mo ago

Yeah, unfortunately, they just proved there existed a better solution, they didn't construct it.

(Though in some ways that's actually more impressive.)

mooreat1mo ago· 4 in thread

I think one interesting thing to point out is that the proof (disproof) was done by finding a counterexample of Erdős' original conjecture.

I agree with one of the mathematician's responses in the linked PDF that this is somewhat less interesting than proving the actual conjecture was true.

In my eyes proving the conjecture true requires a bit more theory crafting. You have to explain why the conjecture is correct by grounding it in a larger theory while with the counterexample the model has to just perform a more advanced form of search to find the correct construction.

Obviously this search is impressive not naive and requires many steps along the way to prove connections to the counterexample, but instead of developing new deep mathematics the model is still just connecting existing ideas.

Not to discount this monumental achievement. I think we're really getting somewhere! To me, and this is just vibes based, I think the models aren't far from being able to theory craft in such a way that they could prove more complicated conjectures that require developing new mathematics. I think that's just a matter of having them able to work on longer and longer time horizons.

gus_massa1mo ago

Searching for a proof and disproof are sometimes not so different. In most cases, you nibble the borders to simplify the problem.

For example, to prove something is impossible let's say you first prove that there are only 5 families, and 4 of them are impossible. So now 80% of the problem is solved! :) If you are looking for counterexamples, the search is reduced 80% too. In both cases it may be useful

In counterexamples you can make guess and leaps and if it works it's fine. This is not possible for a proof.

On the other hand, once you have found a counterexample it's usual to hide the dead ends you discarded.

3 more replies

felipeerias1mo ago

One of the mathematicians in the video describes the process as:

> the AI has been able to explore all these possibilities much more comprehensibly, and doing that it found a path, it found a way to the solution.

Finding a counterexample of a mathematical conjecture strikes me as not that different from finding a vulnerability in a complex codebase.

stevefan19991mo ago

That's proof by contradiction: https://en.wikipedia.org/wiki/Reductio_ad_absurdum

1 more reply

davebren1mo ago

> I think that's just a matter of having them able to work on longer and longer time horizons.

No this will never do the kind of math that humans did when coming up with complex numbers, or hell just regular numbers ex nihilo. No matter how long it's given to combine things in its training data.

2 more replies

zozbot2341mo ago· 4 in thread

The summarized chain of thought for this task (linked in the blogpost) is 125 pages. That's an insane scale of reasoning, quite akin to what Anthropic has been teasing with Mythos.

devttyeu1mo ago

That's here for anyone wondering - https://cdn.openai.com/pdf/1625eff6-5ac1-40d8-b1db-5d5cf925d...

2 more replies

estetlinus1mo ago

Today I generated the equivalent of two LOTR books just to fix three missing rows in my SQL models (and open a PR), so +1

1 more reply

Chamix1mo ago

I note that (though summarized), this is ~100k tokens. Anyone who routinely works with Codex (or any agentic harness really) can tell you how trivial it is to eat up 100k tokens doing complex work. I've personally had plenty of codex 5.5 xhigh sessions where just the pure chain of thought token count in a single turn exceeds 200k (and I assume doesn't go further only due to compaction meta-guidance; the harness will push the model to stay under 256k per turn/thinking block) .

I think the more interesting question is how many tokens were spent all told; the most interesting graph in the article imo is the success rate by log test-time compute: how many tokens are being spent on the right of the graph to hit a winning CoT/solution like this >50% of the time?

stratos1231mo ago

AFAIK, extremely long CoT is fairly routine for those math-finetuned private models. Also, is Mythos unusually verbose compared to Opus?

recitedropper1mo ago· 4 in thread

This is impressive, no question.

Without knowing all this model has been trained on though, it is pretty hard to ascertain the extent to which it arrived to this "on its own". The entire AI industry has been (not so secretly) paying a lot of experts in many fields to generate large amounts of novel training data. Novel training data that isn't found anywhere else--they hoard it--and which could actually contain original ideas.

It isn't likely that someone solved this and then just put it in the training data, although I honestly wouldn't put that past OpenAI. More interesting though is the extent to which they've generated training data that may have touched on most or all of the "original" tenets found in this proof.

We can't know, of course. But until these things are built in a non-clandestine manner, this question will always remain.

JacobAsmuth1mo ago

Exactly. Maybe OpenAI paid mathematicians to keep this discovery quiet, then added their proof to the training data, then manipulated a second team into prompting for this question such that the model could regurgitate the solution. This would plausibly explain why the model seems so capable at doing things like refuting fundamental theorems of mathematics while in things like competitive programming, biology, and physics it's merely only in the top 99.9%.

4 more replies

muhneesh1mo ago

This type of discourse is just inane and more reflective of the author's sensibilities than anything it claims.

Congrats to the OpenAI team for one of the most significant breakthrough discoveries in AI history.

2 more replies

Rover2221mo ago

Seems like a very tin-foil-hat-take to me

3 more replies

fergie1mo ago

> The entire AI industry has been (not so secretly) paying a lot of experts in many fields to generate large amounts of novel training data. Novel training data that isn't found anywhere else--they hoard it--and which could actually contain original ideas.

Really? Any references to read more?

1 more reply

Quentak1mo ago· 3 in thread

I'd like to know how many tokens in total went into solving this problem. Have they talked about this? It matters whether they got this result in 10 million tokens or 10 billion. Whether it's closer to 1 human working on this for 1 year or 1000 humans for 1 year. The news feels different when the probability of one AI run solving this is 1 in a thousand vs 1 a million. Approximately I'm asking about the amount of money it cost to solve it, which has to include the failed parallel runs.

_diyar1mo ago

Any answer to this question must also consider the current cost/token and its downward trajectory as algo and hardware advances drive down costs.

vessenes1mo ago

Good q. You can see that tantalizing graph where compute is displayed on a log axis and pass@1 goes up to like 50/60%. I’d love to know just how much compute is encoded on that axis.

I guess you can get some estimate from the excerpted CoT, but that CoT might be backed by quite a lot of parallel compute.

ApolloRising1mo ago

I would like to know this as well, including how long was it working on this problem?

dwroberts1mo ago· 2 in thread

Would be interesting to know what kind of preparatory work actually went into this - how long did it take to construct an input that produced a real result, and how much input did they get from actual mathematicians to guide refining it

lacewing1mo ago

Why?

It's clearly not yet a tool that can deliver new math at a scale. I say this because otherwise, the headline would be that they proved / disproved a hundred conjectures, not one. This is what happened with Mythos. You want to be the AI company that "solved" math, just like Anthropic got the headlines for "solving" (or breaking?) security.

The fact they're announcing a single success story almost certainly means that they've thrown a lot of money at a lot of problems, had experts fine-tuning the prompts and verifying the results, and it came back with a single "hit". But that doesn't make the result less important. We now have a new "solver" for math that can solve at least some hard problems that weren't getting solved before.

Whether that spells the end of math as we know... I don't think so, but math is a bit weird. It's almost entirely non-commercial: it's practiced chiefly in the academia, subsidized from taxes or private endowments, and almost never meant to solve problems of obvious practical importance - so in that sense, it's closer to philosophy than, say, software engineering. No philosopher is seriously worried about LLMs taking philosopher jobs even though they a chatbot can write an essay, but mathematicians painted themselves into a different corner, I think.

3 more replies

OkWing991mo ago

Says in the papers. "...which was first mathematically generated in one shot by an internal model at OpenAI, and then expositionally refined through human interactions with Codex."

Doesn't really matter the prep-work, what they say is it's a one-shot result, achieved by AI. The blog doesn't claim it was done by a currently public Model.

1 more reply

Fraterkes1mo ago· 2 in thread

I guess if this stuff is going to make my employment more precarious, it’d be nice if it also makes some scientific breakthroughs. We’ll see

ausbah1mo ago

shame we won’t see any of these medical breakthroughs when we all lose our jobs and thus our healthcare

1 more reply

cubefox1mo ago

Breakthroughs in pure mathematics aren't scientific though. They say us nothing about the world, and they are not useful.

CGMthrowaway1mo ago· 2 in thread

How do you even get an LLM to try to solve one of these problems? When I ask it just comes back with the name of the problem and saying "it can't be done"

lovecg1mo ago

By making it think for 100+ pages https://cdn.openai.com/pdf/1625eff6-5ac1-40d8-b1db-5d5cf925d... Regular ChatGPT users don’t have a way to do that, this is something they do internally only.

edit: apparently that’s only the _condensed summary_ of the chain of thought.

2 more replies

KalMann1mo ago

Maybe you need to phrase it better. Like with a more specific direction of thinking.

dadrian1mo ago· 2 in thread

While the result is impressive, this blog post is extremely disappointing.

- It does not show an example of the new best solution, nor explain why they couldn't show an example (e.g. if the proof was not constructive)

- It does not even explain the previous best solution. The diagram of the rescaled unit grid doesn't indicate what the "points" are beyond the normal non-scaled unit grid. I have no idea what to take away from it.

- It's description of the new proof just cites some terms of art with no effort made to actually explain the result.

If this post were not on the OpenAI blog, I would assume it was slop. I understand advanced pure mathematics is complicated, but it is entirely possible to explain complicated topics to non-experts.

changoplatanero1mo ago

apparently the proof is not constructive in the sense of not giving an easy to compute recipe for generating a set of points that you can plot on a 2d plane

Al-Khwarizmi1mo ago

Indeed, it's a pity. While many advanced math problems are highly abstract or convoluted to explain to a layman audience, this one in particular is about points in a 2D plane and distances. A drawing would have been nice.

1 more reply

ccvannorman1mo ago· 1 in thread

I looked at all linked articles and could not find an example of the points (they show a square grid of points with n~=100 but no other ordering of points to show the more optimal layout(s)).

Is there anywhere an image example of a superior layout for example with n>={100,1000,10000}..? I would love to see it. I am imagining it would look somewhat like a sloppy pizza.

lg56891mo ago

One of the authors said in a reddit comment (and I hope I am summarizing accurately) that it's impossible to show a diagram as the smallest instance of the technique gives like 10^1000000 points.

1 more reply

endymi0n1mo ago· 1 in thread

To paraphrase Gwynne Shotwell: “Not too bad for just a large Markov chain, eh?”

rhubarbtree1mo ago

Erdos, or the model?

__0x011mo ago· 1 in thread

From the companion paper:

> The argument relies crucially on ideas that may, at least in retrospect, be attributed to Ellenberg-Venkatesh, Golod-Shafarevich, and Hajir-Maire-Ramakrishna.

Can someone please elaborate on this?

awdfeswavcra1mo ago

The last two are straightforward. The proof relies on a result called the Golod-Shafarevich theorem that gives a criterion for a group to be infinite. Golod and Shafarevich proved this a long time ago (1964). Moreover, if you look at how Golod and Shafarevich used this criterion, it's the same way it's used in the proof: They apply it to some Galois groups that appear in number theory, prove these are infinite in certain cases, and deduce that there exists an infinite tower of number fields with some surprising properties.

Much more recently (2021), Hajir, Maire, and Ramakrishna figured out how to apply the Golod-Shafarevich theorem to a slightly different Galois group to produce an infinite tower of number fields with some even more surprising properties. This is used in the new proof. It requires very slightly modifying the construction of Hajir, Maire, and Ramakrishna to produce the fields needed in this proof, but the explanation of how to do this takes only a paragraph in the human-written summary. (The explanation is more laborious in the original AI writeup).

The relation to Ellenberg-Venkatesh is more indirect. This is where "in retrospect" comes in because this work was not cited in the original AI proof. This has to do with the next step of the proof, after you construct the number field, you need to find many elements of this field with the same norm to produce many vectors of the same length. To do this, the proof uses a pigeonhole argument which uses small split primes of the field (constructed via Hajir, Maire, and Ramakrishna's argument) to construct many ideals. By the pigeonhole principle, you can guarantee two ideals lie in the same class. When two ideals lie in the same class, you get an element of the field. You can rig things so these elements all have the same norm. Ellenberg and Venkatesh had an argument which also used the pigeonhole prnciple to guarantee two ideals lie in the same class to produce elements of the field. They were working on a different problem so their argument was slightly different, but similar.

ks20481mo ago· 1 in thread

Timothy Gowers' tweet about this: "If you are a mathematician, then you may want to make sure you are sitting down before reading futher.".

woah.

missyougowers1mo ago

Unfortunately Gowers has taken Tao's lead on this one.

Gowers has one of my favourite video series about how he approaches a problem he is unfamiliar with: https://www.youtube.com/watch?v=byjhpzEoXFs

It is disheartening to see him jump into this GenAI puffery.

I hope these GenAI labs are paying Tao handsomely for legitimizing their slop, but more likely he's feeling pressure from his University to promote and work with these labs.

My guess is Gowers wants in on that action, or his University does.

Either way, it makes me sad. If its self motivated... even sadder.

5 more replies

purpleidea1mo ago· 1 in thread

You'd think a billion dollar company would be able to normalize the sound level on their video :/

llagerlof1mo ago

That will require AGI.

catigula1mo ago· 1 in thread

Every time I interact even with OpenAI's pro model, I am forced to come to the conclusion that anything outside the domain of specific technical problems is almost completely hopeless outside of a simple enhanced search and summary engine.

For example, these machines, if scaling intellect so fiercely that they are solving bespoke mathematics problems, should be able to generate mundane insights or unique conjectures far below the level of intellect required for highly advanced mathematics - and they simply do not.

Ask a model to give you the rundown and theory on a specific pharmacological substance, for example. It will cite the textbook and meta-analyses it pulls, but be completely incapable of any bespoke thinking on the topic. A random person pursuing a bachelor's in chemistry can do this.

Anything at all outside of the absolute facts, even the faintest conjecture, feels completely outside of their reach.

dvfjsdhgfv1mo ago

Yeah, I remember it was one of my biggest disappointments with LLMs.

precision1k1mo ago· 1 in thread

I see mixed emotions here. I understand both. On one hand it's exciting and fascinating. On the other it's concerning. One concern I haven't seen mentioned is the possibility that, as these models become larger and more powerful, their capability to solve frontier math problems will also grow. Does there become a point where humans are no longer the driver of innovation and research in this world, and instead are relegated to become stewards of the AI models whose purpose is to push the boundaries of mathematics, theoretical physics and other academic disciplines?

noslenwerdna1mo ago

For those of us who care about the answers to these questions, rather than who gets credit for doing it, we will welcome any faster means of solving these problems.

1 more reply

throwaway20271mo ago

Not to dismiss the AI but the important part is that you still need someone able to recognize these solutions in the first place. A lot of things were just hidden in plain sight before AI but no one noticed or didn't have the framework either in maths or any other field they're specialized in to recognize those feats.

zone4111mo ago

I actually tried using GPT-5.5 Pro on this problem recently. It thought it was making progress on one path, but it made so many mistakes that it didn't feel worth it pushing further. It'll be interesting to check whether it's the same route. I got partial results (proved in Lean) that improve on the best-known results for four Erdős problems with GPT-5.5 Pro

ferris-booler1mo ago

What strikes me in this case (and I haven't seen in other comments) is that it's a _disproof_ of a conjecture put forth by Erdős and supported (at least according to OpenAI) by other professional mathematicians. Erdős, one of the greats, thought that the limit was O(n^{1 + o(1)}), which GPT disproved.

We can argue about recombination/interpolation of training data in LLMs, but even if this was an interpolation, the result was contrarian rather than a confirmation. Any system that can identify an error in Erdős's thinking seems very useful to me (though perhaps he did not spend much time thinking about or checking this particular conjecture).

zmmmmm1mo ago

As a side observation, it is striking but also not surprising in retrospect that the big successes in AI are coming from domains where things are fundamentally verifiable. Both software and math are either fully verifiable or low-cost verifiable (breaking a test is not the same cost as building a bridge and watching it fall down to see if it worked).

Other domains are extracting value but I feel like there's an order of magnitude difference. It raises the question, what other domains fit into these categories where the AI itself has pretty much free reign to verify its own results?

isolli1mo ago

Question:

The conjecture was about an upper bound for the maximum number of pairs. It has been disproven.

Was the Erdos problem the conjecture itself, or was it about the actual maximum number of pairs? (In which case it will probably never be solved.)

The problem is defined in the narrow version here: https://www.erdosproblems.com/90

dwa35921mo ago

Few questions that the blog did not answer, if anyone knows that'll be great:

- Does anyone know if this was a 1 minute of inference or 1 month?

- How many times did the model say it was done disproving before it was found out that the model was wrong/hallucinating?

- One of the graphs say - the model produced the right answer almost half the times at the peak compute??? did i understand that right? what does peak compute mean here?

kevinwang1mo ago

Nitpicky/not important, but they say:

Since loglog(n) tends to infinity with n, the additional term in the exponent tends to 0, meaning these constructions achieve growth only slightly faster than linear.

Would anyone else describe the previous asymptotic behavior like that? I mean obviously loglogn to O(1) is a quantum leap, but wouldn't you describe loglogn as "grows so slowly it's almost constant", so the constructions achieve growth "almost n^{1+c}"? But I guess that might be overcorrecting too hard.

1 more reply

famouswaffles1mo ago

Another entry in a growing list of the last couple months (interestingly mostly Open AI):

1. Erdos 1196, GPT-5.4 Pro - https://www.scientificamerican.com/article/amateur-armed-wit...

There are a couple of other Erdos wins, but this was the most impressive, prior to the thread in question. And it's completely unsupervised.

Solution - https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...

2. Single-minus gluon tree amplitudes are nonzero , GPT-5.2 https://openai.com/index/new-result-theoretical-physics/

3. Frontier Math Open Problem, GPT-5.4 Pro and others - https://epoch.ai/frontiermath/open-problems/ramsey-hypergrap...

4. GPT-5.5 Pro - https://gowers.wordpress.com/2026/05/08/a-recent-experience-...

5. Claude's Cycles, Claude Opus 4.6 - https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cyc...

analognoise1mo ago

Back when “term rewriting” was “AI”, multiple math tools were released that took known math facts and did tricks like uncovering new integrals - apply the pattern in some depth in a tree, see what pops out.

What was discovered were numerous mistakes in the published literature on the subject. “New math! AI!” No, just mechanical application of rules, human mistakes.

There were things that were theorized, but couldn’t be exhaustively checked until computers were bigger.

Once again, a tool is applied, it has the AI label - its progress! But it isn’t something new. It’s just an LLM.

There’s a consistent under appreciation of AI (and math, honestly), but watching soulless AI mongers declare that their toy has created the new is something of a new low; uninspired, failed creatives, without rhyme or context; this is a bigger version of declaring that your spell checker has created new words.

The result is more impressive than what was done with tables of integrals and SAINT in 1961, sure.

Apparently if you add a “temperature” knob to a text predictor, otherwise sane individuals piss themselves and call it new.

Then again I thought NFTs, crypto, and the Metaverse were stupid, so what do I know.

foota1mo ago

They should feed it the classification of finite simple groups and get it to simplify it/turn it more constructive.

Topology11mo ago

As someone starting grad school for pure mathematics, this has me both excited and nervous, but mainly the latter...

alansaber1mo ago

AI isn't going to supercharge science but I wouldn't be as dismissive as other posters here.

7 more replies

Kye1mo ago

Is this something that can be made explainable to someone without any of the relevant background, or is this one of those things where all that background is needed to understand it? Because I have no idea what's going on here, but would like to.

agentultra1mo ago

I’m curious about the “autonomous” claim. Usually these systems require a human to guide and verify steps, clarify problems, etc. are they claiming that the reinforcement model wasn’t given any inputs, tools, guidance, or training data from humans?

taimurshasan1mo ago

I wonder how much this cost vs a Math Professor or a team of Math Professors.

2 more replies

globulus20231mo ago

In the article there is a diagram of the “square grid” arrangement that achieves approximately 2n points separated by unit distance.

Can anyone point me to a diagram of what the newly found solution looks like?

globulus20231mo ago

In the article there is a nice clear diagram of the “square grid” arrangement that was previously thought to be optimal.

Can anyone point me to a diagram of the newly found optimal arrangement?

phkahler1mo ago

I would have thought a triangular grid works better than a grid of squares. You get ~3n links vs ~2n for the square grid. Curious what the AI came up with.

3 more replies

num421mo ago

I am not surprised! The birth of computer science was rooted in the desire to automate mathematical discovery and proof writing.

armanj1mo ago

useless fact: there is no mention to "gpt" in this article. the ai is referred to as "An internal OpenAI model".

adt1mo ago

https://lifearchitect.ai/asi/

dev1ycan1mo ago

Wouldn't surprise me if they're just paying math geniuses to do math research and attribute it to AI models.

zuzululu1mo ago

This topic and discussion is out of my league what is the implication here ? LLMs aren't a dead end ?

yusufozkan1mo ago

"The proof came from a general-purpose reasoning model, not a system built specifically to solve math problems or this problem in particular, and represents an important milestone for the math and AI communities."

3 more replies

SubiculumCode1mo ago

I wonder whether there will be progress in string theory from these kinds of applications of AI.

solomatov1mo ago

How central is it in the discrete geometry? Could anyone with the knowledge in the field reply?

4 more replies

empath751mo ago

Important note: this was not done with a special mathematics harness or specialized workflow.

2 more replies

pizzao1mo ago

Can someone explain to me what is their "prompting-scaffolding" to make it work ?

1 more reply

aussieguy12341mo ago

So we've got the proof, what are the practical applications of this?

1 more reply

sinuhe691mo ago

How did they jump from finding counter-examples (disproof) to a proof?

momo261mo ago

I'm curious that giving an counter-example is kind of easy to disprove. But can the model really prove something correctly and rigorously? Cuz now it seems like all the knowledge is based on the existed thing, and none of them can prove a myth.

oscord1mo ago

Can it model a sustainable economy model, with human happiness and fulfilment indexes and planet preservation focus? Current capitalism and the red thing are so tired!

callamdelaney1mo ago

The only relevant question is, how much did it cost?

auggierose1mo ago

Which model did this? Is it available to the public?

_heimdall1mo ago

As this becomes more common it makes me wonder where the LLM ends and the harness begins.

The underlying model may still effectively be a stochastic parrot, but used properly that can do impressive things and the various harnesses have been getting better and better at automating the use of said parrot.

anonymousDan1mo ago

Come back to me when it has solved P=NP.

iLoveOncall1mo ago

Absolutely no proof that any LLM actually found the result, and just a mention of an "internal model". Served to you by one of the biggest liars in the world.

Why would anyone believe this to be true even for a split second?

1 more reply

seydor1mo ago

can the AI please tell us what to do now that all knowledge work will become unemployment?

2 more replies

alsetmusic1mo ago

> AI is about to start taking a very serious role in the creative parts of research, and most importantly AI research itself. While this progress is not unexpected, it reinforces the urgency we feel about understanding this next phase of AI development, the challenges of aligning very intelligent systems, and the future of human-AI collaboration.

I find this hyperbolic, but ya gotta juice up the upcoming IPO. I hate that they took an interesting announcement and reminded me why I hate tech and our society at the end.

34228171mo ago

Nice. By the year 2100 200 Erdos problems will have been solved by AI. Let's build more data centers.

overgard1mo ago

I think it's worth being skeptical of this.. there's a way too common pattern of "AI Lab Shows AI Doing Something Only Humans Can Do" only for a bunch of important caveats and limitations to be discovered after the initial hype. And of course, the correction never seems to be as viral as the hype. I'll believe it when a mathematician actually reads the 100+ pages of reasoning.

neuroelectron1mo ago

I wonder if it has anything to do with the fact that AI is a grid of grid-calculating grids. It seems like it would be especially well suited to finding solutions about grids. That is until you consider the fact that even 1 trillion billion grids is still not anywhere close to an infinite grid. So, probably slop.

somewhereoutth1mo ago

The real test would be if an LLM makes an important conjecture.

ai_fry_ur_brain1mo ago

Im convinved they target these pure math problems because math is very occulted to the masses, and therefor can use math "discoveries" as a way to make an LLM seem more impressive than it is.

Everything is a grift.

What are the odds that if they ran the same prompt from scratch, with the same context and instructions that it would arrive at the same answer? Unlikely. I think its more likely that this is a 1:500000 chance and OpenAI can afford to brute force this result and justify the expense for marketing.

AlexToaniAI1mo ago

So nowadays. AI may use different field and get lots of break through that migh human can't done! That's nuts!

PeterStuer1mo ago

There's still parrots parroting 'stochastic parrot'.

arsan871mo ago

neato. can we do any thing with this new found knowledge or is this mathematical sports?

can we please put these ground breaking AIs to work on actual problems humans have?

1 more reply

DiogenesKynikos1mo ago

Calling all LLM skeptics. How did a "stochastic parrot" just disprove an Erdős conjecture that mathematicians couldn't figure out for decades?

bradleykingz1mo ago

ok. so what are the implications of for math

1 more reply

fromMars1mo ago

Seems rather depressing to me but maybe I am a Luddite.

1 more reply

ninjagoo1mo ago

Many folks are upset about the supplanting of human effort by ai. Umanwizard voiced this valid concern below [1], but his comment got downvoted, unfairly, IMHO, instead of just being addressed. So putting out at least my response as its own top-level comment for visibility.

> the closer the expertise you spent your whole life building is to being worthless.

Perhaps it is time for life to be considered intrinsically valuable, instead of being "worthy" only based on output or capability. Disability, animal and environmental advocates have been fighting for this for a long time. Not too long ago women and minorities were in the same boat. Even now, there are many advocating and fighting for a return to the dark old days.

> Along with all the rest of what humans find meaningful and fulfilling.

Some humans. Many are content to enjoy simply existing, and the beauty of life and the universe around us. Just like many non-scientists today enjoy and benefit from the work of scientists, tomorrow too many will enjoy learning from, and applying the coming advancements and leaps in many fields.

And those of a scientist or other research-type mindset? No doubt they will contribute meaningfully by studying the frontier, noting what remains unanswered, and then advancing the frontier, just like researchers do today; just because scientists in the past solved many questions doesn't mean that there aren't any questions to answer today.

IMHO, AI means that the frontier expands faster, not that it is obliterated. Even AI cannot overcome the laws and limitations of physics/universe: even Dyson spheres only capture the energy of one star, thus setting a limit on the amount of compute, and thereby a limit on intelligence. And we are a loooong way from a Dyson sphere.

[1] https://news.ycombinator.com/item?id=48215122

brcmthrowaway1mo ago

End times are approaching

mrcwinn1mo ago

The back and forth in this discussion reveals to me we are sorting through a kind of philosophical debate about intelligence. That alone tells me LLMs are doing something novel.

voooduuuuu1mo ago

Ask an LLM to invent a new word and post it here. You will see that it simply combines words already in the training data.

10 more replies

atleastoptimal1mo ago

To all AI skeptics:

What is preventing AI from continuing to improve until it is absolutely better than humans at any mental task?

If we compare AI now vs 2022 the difference is outstandingly stark. Do you believe this improvement will just stop before it eclipses all humans in everything we care about?

8 more replies

cwmoore1mo ago

From the meandering and self-loving article:

“ For decades, it was widely believed that this rate was essentially the best possible, and no construction could improve significantly over the square grid. In technical terms, Erdős conjectured an upper bound of n 1 + o ( 1 ) n 1+o(1) in which the additional o ( 1 ) o(1) indicates a term tending to 0 0 with n n.

Our new result disproves this conjecture. More precisely, for infinitely many values of n n, the proof constructs configurations of n n points with at least n 1 + δ n 1+δ unit-distance pairs, for some fixed exponent δ > 0 δ>0. (The original AI proof does not give an explicit δ δ, but a forthcoming refinement due to Princeton mathematics professor Will Sawin has shown one can take δ = 0.014 δ=0.014.)”

reactordev1mo ago

I dunno, I'm skeptical without proof. I've had the MAX+ plan for a while and I'm sorry, the quality between GPT vs Claude is night and day difference. Claude understands. GPT stumbles over every request I give it.

1 more reply

j / k navigate · click thread line to collapse

1055 comments

243 comments · 84 top-level

m-hodges1mo ago· 27 in thread

To the “LLMs just interpolate their training data” crowd:

I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.

So either “recombining existing material” isn’t disqualifying, or a lot of Fields Medals need to be returned.

pseudocomposer1mo ago

7 more replies

ykl1mo ago

I like to think of it as:

3 more replies

dvt1mo ago

> I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.

For example, many belive that to sovle the Riemann Hypothesis, we likely need some new kind of math. Imo, it's unlikely that an LLM will somehow invent it.

6 more replies

stego-tech1mo ago

As others have pointed out, both can be true:

* LLMs do just interpolate their training data, BUT-

* That can still yield useful "discoveries" in certain fields, absent the discovery of new mechanics that exist outside said training data

2 more replies

midtake1mo ago

4 more replies

hammock1mo ago

Recombining existing material is exactly right, and in this case LLMs were uniquely positioned to make the connection quicker than any group of humans.

The proof relies on extremely deep algebraic number theory machinery applied to a combinatorial geometry problem.

Two humans expert enough in either of those totally separate domains would have to spend a LONG time teaching each other what they know before they would be able to come together on this solution.

2 more replies

nomel1mo ago

1 more reply

thechao1mo ago

    (uv)(vu) = (uu)(vv)

Shows up as a primitive structure, quite often.

sillysaurusx1mo ago

3 more replies

mrandish1mo ago

I'm just hoping we're almost past this phase of needing to assess LLM capabilities against an arbitrary one dimensional yard stick labeled 'Not Human' on end and "Beyond Human' on the other.

awesome_dude1mo ago

THAT is where we (as humans) have excelled - we've taken natural processes, discovered their attributes and properties, and then understood how they can be applied to other domains.

LLMs have ZERO ability (at this moment) to interact with, and discover on their own, those facts, nor does it appear to know how to leverage them.

edit: I am going to go further

We're also realising that MUCH smaller things exist - atoms, and things that compose atoms, and things that compose things that compose atoms

That much is derived from previous knowledge

What isn't, and it's what LLMs cannot create - is tools by which we can detect or see these incredible small things

austinl1mo ago

E.g. training on physics knowledge prior to 1915, then attempting to get from classical mechanics to general relativity.

1 more reply

libraryofbabel1mo ago

This is a good point, and there’s some deep philosophical questions there about the extent to which mathematics is invented or discovered. I personally hedge: it’s a bit of both.

smaudet1mo ago

If anything, this is more illustration of how llms are not useful to us...

They will do their own thing, don't need us. In fact, we will be in the way...

We can choose to study them and their output, but they don't make us better mathematicians...

2 more replies

anon2911mo ago

paulddraper1mo ago

"LLMs just interpolate their training data"

Cracks me up.

What exactly do we think that human brains do?

9 more replies

zerr1mo ago

There is a creational aspect in math - definitions and rules are created.

1 more reply

mejutoco1mo ago

It is the old discovery vs invention in mathematics.

Or of you prefer philosophy: Parmenides (nothing changes) vs Heraclitus (you cannot bath twice in the same river aka everything changes all the time).

oh_my_goodness1mo ago

We know that LLMS "just interpolate" their training data. Maybe there's a mystery about what "just interpolate" means when the data set gets enormous. But we know what LLMs do.

adam_arthur1mo ago

Pretty much everything that appears novel in life is derivative of other works or concepts.

You can watch a rock roll down a hill and derive the concept for the wheel.

Seems pretty self evident to me

jonahx1mo ago

> held that mathematical truths don’t report new facts about the world

I'm not as familiar with the early work, but later Wittgenstein held this belief too.

chr15m1mo ago

Side note: don't underestimate how much literal, physical time and energy "unfold" implies. Proofs occur on physical substrates.

block_dagger1mo ago

This is the second reference to Wittgenstein I’ve seen today in totally different contexts. Reminded me how much I vibe with his Tractatus.

cyanydeez1mo ago

I think someone should be talking to Godel.

goldylochness1mo ago

this is an excellent point, new ground isn't necessarily novel, it's a rearrangement of existing pieces

BoredPositron1mo ago

Post hoc ergo propter hoc

voooduuuuu1mo ago

6 more replies

vatsachak1mo ago· 18 in thread

As I have stated before, AI will win a fields medal before it can manage a McDonald's

A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.

LLMs are just the beginning, we'll see more specialized math AI resembling StockFish soon.

trostaft1mo ago

> A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.

4 more replies

Terr_1mo ago

> manage a McDonald's

Dystopia vibes from the fictional "Manna" management system [0] used at a hamburger franchise, which involved a lot of "reverse centaur" automation.

[0] https://en.wikipedia.org/wiki/Manna_(novel)

2 more replies

Lerc1mo ago

3 more replies

evenhash1mo ago

The proof is not written in Lean, though. It’s written in English and requires validation by human experts to confirm that it’s not gibberish.

1 more reply

jeremyjh1mo ago

Humans haven't learned as much new information about chess from Stockfish as we have from Leela.

1 more reply

soupspaces1mo ago

1 more reply

energy1231mo ago

The issue with this prediction is the gulf between problem-solving using known tools, versus creating new concepts for problems where existing tools aren't enough.

Problems worth of a Fields Medal likely require new tools to be invented. Thus it is not clear whether progress within the confines of the current paradigm is enough.

Timothy Gowers gives some commentary on this in the attached PDF.

auggierose1mo ago

> A difficult part was constructing a chess board on which to play math

KalMann1mo ago

1 more reply

forinti1mo ago

AI is already too old for that.

sigmoid101mo ago

3 more replies

volkercraig1mo ago

> we'll see more specialized math AI resembling StockFish soon

Heuristically weighted directed graphs? Wow amazing I'm sure nobody has done that before.

1 more reply

brikym1mo ago

Hey ChatGPT, if a person spills hot McCoffee on themselves who is at fault?

1 more reply

fapjacks1mo ago

I dunno. Is AI less than forty years old?

whimsicalism1mo ago

the only thing keeping the mcdonalds from happening will be political, likewise the same with fields medal

ori_b1mo ago

We're automating art and science so that we can flip burgers. This future sucks.

2 more replies

segmondy1mo ago

our local AI models are already capable of running McDonalds.

2 more replies

dyauspitr1mo ago

1 more reply

trostaft1mo ago· 15 in thread

xbmcuser1mo ago

3 more replies

energy1231mo ago

https://youtu.be/Uc2zt198U_U?si=OkwO3xT8-zhSABwh

1 more reply

inciampati1mo ago

1 more reply

ontouchstart1mo ago

This is a very important point, especially when the output is from a non-deterministic random walk with some unknown probability distribution.

JohnHammersley1mo ago

Yes, I share your optimism overall, although I think it is raising a question of what the future role of the researcher is (much like the current debate on developer roles).

I wrote up a few notes [1], and most of the speaker videos are available via the conference website [2].

[1] https://scholarlyfutures.substack.com/p/ai-and-the-practical...

[2] https://www.newton.ac.uk/event/ooew11/

shalmanese1mo ago

If AI produced legitimate theoretical breakthroughs at a pace mathematicians are unable to absorb, then the impact will be neutral to negative.

isotypic1mo ago

qnleigh1mo ago

colordrops1mo ago

teiferer1mo ago

> Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced.

And by opening the door to LLM-generated results, you'll see greater and greater amounts without any hope of ever navigating this field again without machine help.

3 more replies

bandrami1mo ago

I am curious if LLMs are better at some kinds of problems than others. IIRC this and another big recent one were cases of the LLM producing a counterexample to a conjecture.

1 more reply

julianozen1mo ago

Nice response to read

Mikhail_K1mo ago

> But, for scientists, I find that these tools address the problem of the > exploding complexity barrier in the frontier.

They do the opposite by locking the results the produce within the slop presentation that needs more AI to comprehend.

doctorpangloss1mo ago

2 more replies

umanwizard1mo ago

Why would it excite you, rather than terrifying you? The better LLMs get at math, the closer the expertise you spent your whole life building is to being worthless.

Along with all the rest of what humans find meaningful and fulfilling.

11 more replies

cpard1mo ago· 12 in thread

The proof brings unexpected, sophisticated ideas from algebraic number theory to bear on an elementary geometric question.

rjzzleep1mo ago

So the crossdomain pollination that used to exist in scientists is not only not encouraged. It's also actively punished by society.

3 more replies

freakynit1mo ago

And this is where machines, such as these reasoning LLMs, can help. Because they can remember patterns across many domains and try absolutely bonker weird connections and ideas.

I have always said this and will say it again: reasoning is just experimentation with a feedback loop and continuous refinement.

doubledamio1mo ago

1 more reply

keyle1mo ago

I think you're on point, and you've explained it very well.

As we're becoming hyper specialised, they become an invaluable tool to merge the horizon in, so to speak.

1 more reply

dhosek1mo ago

1 more reply

margorczynski1mo ago

I think we still don't really comprehend how much can be achieved by a single "mind" that has internalized so much knowledge from so many areas.

1 more reply

efavdb1mo ago

It’s as if the body of human knowledge is our I’ve mind. It used to be expensive to access that, but no more.

Cool thing is now when someone contributes something to the hive mind, it can instantly be applied to any other problem people are working on.

mxfh1mo ago

nashadelic1mo ago

There are so many research papers; just finding a solution to, say, a bio problem in a deep math paper would be a gold mine of opportunity. Very exciting times!

psb51mo ago

Check out Ashby's Law of Requisite Variety

make31mo ago

It's too bad that we're not improving humans nearly as fast as we're replacing ourselves.

1 more reply

nicman231mo ago

like the research team that rediscovered calculus for treating diabetics

raincole1mo ago· 12 in thread

bananaflag1mo ago

As a mathematician, new, conceptual math is when I'll become interested in reading LLM output.

I appreciate very much the work done so far, but this sort of asymptotic/quantitative result didn't interest me much even when it was done by humans.

(This is not snobbery, just a personal preference.)

2 more replies

golol1mo ago

2 more replies

InsideOutSanta1mo ago

What I assumed they were saying is that their LLMs would be as intelligent as a human with a PhD across all, or at least most, knowledge tasks, and they clearly are not.

zamadatix1mo ago

2 more replies

necovek1mo ago

PhDs used to mean publishing a novel mathematical result: when has that changed?

4 more replies

turzmo1mo ago

1 more reply

xigoi1mo ago

dawnerd1mo ago

Yet it still codes like a junior developer that memorized all of stack overflow.

7 more replies

staticman21mo ago

2 more replies

zeofig1mo ago

I still laugh.

1 more reply

Valakas_1mo ago

And the goalposts will keep getting moved all the way to the singularity. And then those people will/would say "Oops. I was wrong."

xgulfie1mo ago

large language models do not have pigeon-level intelligence. They can't even feed themselves.

0x5FC31mo ago· 10 in thread

jltsiren1mo ago

2 more replies

bananaflag1mo ago

Erdos problems are easier to state, thus they make a great benchmark for the first year of AI mathematics.

tonfa1mo ago

Afaik this is because there is a community and database around them.

1 more reply

empath751mo ago

1 more reply

TrackerFF1mo ago

As others have written, Erdős was a lifelong curator of mathematical problems, from high-school level problems to the types that will land you a Fields medal. Like the Collatz conjecture.

Most new math problems appear in other papers, doctoral dissertations, etc. Usually you'll find them in the "future work" / "future research" section.

In short, it is a visibility problem.

But, I figure, there's some potential use in AI models to extract and present these problems, which would make them available to a larger audience.

That is exactly what Erdős did. His life revolved around math, and seeking mathematical questions.

throw-the-towel1mo ago

They're just famous because Erdos was a great mathematician, kinda like the Hilbert problems a century earlier.

famouswaffles1mo ago

It's not just Erdos problems - https://news.ycombinator.com/item?id=48213189

odie55331mo ago

I was promised a cure for cancer, but all I got was this disproof of an Erdos problem.

cold_harbor1mo ago

Erdos problems are well-posed for AI — elementary statements, exact counterexample targets, extensively catalogued. selection bias: these are exactly the problems AI can actually search

xyzsparetimexyz1mo ago

The models can't actually so good work on practical problems so openai tasks them on stuff nobody cares about

aurareturn1mo ago· 7 in thread

One thing seems for certain is that OpenAI models hold a distinct lead in academics over Anthropic and Google models.

For those in academics, is OpenAI the vendor of choice?

Jcampuzano21mo ago

OpenAI specifically targeted Academia a lot and gave out a lot of free/unlimited usage to top academics and universities/researchers.

They also offer grants you can apply for as a researcher. I'm sure other labs may have this too but I believe OpenAI was first to this.

tracerbulletx1mo ago

Hasn't AlphaFold been used to make real discoveries for a few years now?

1 more reply

karmasimida1mo ago

I think the mathematicians on X are all using GPT 5.5 Pro

1 more reply

bayindirh1mo ago

From my limited testing, Gemini can dig out hard to find information given you detail your prompt enough.

Given that Google is the "web indexing company", finding hard to find things is natural for their models, and this is the only way I need these models for.

If I can't find it for a week digging the internet, I give it a colossal prompt, and it digs out what I'm looking for.

1 more reply

logicchains1mo ago

OpenAI models seem to have been trained on a lot of auto-generated theorem proving data; GPT 5.5 is really good at writing Lean.

FloorEgg1mo ago

As far as academic research is concerned (e.g. this threads topic), I can't say.

4 more replies

causal1mo ago

A simpler explanation is that more people are using ChatGPT

libraryofbabel1mo ago· 7 in thread

This HN thread depressed me. I’m still thinking about why.

Yet the comments are just a battleground of people rehearsing the same tired arguments about LLMs from 2023, refutations of those arguments, angry counters, etc.

Does it make anyone else sad that the battle lines seem to have been drawn 3 years ago and we just seem to have the same fights over and over?

I wonder if we’ll still be doing this two years hence.

getnormality1mo ago

Yes, this and every internet forum will still be doing this two years hence. Your life will be better if you take to heart this famous passage from Nietzsche:

I do not want to wage war against what is ugly. I do not want to accuse; I do not even want to accuse those who accuse. Looking away shall be my only negation.

1 more reply

jryan491mo ago

People are afraid for their livelihood. What do you expect?

3 more replies

doginasuit1mo ago

> I wonder if we’ll still be doing this two years hence.

3 more replies

godelski1mo ago

I think part of it is that one side throws rocks and so it never even matters was is in the article. It becomes a battle if the article is good or the article is shit.

1 more reply

RedCinnabar1mo ago

What I don’t understand is why people dismiss this kind of progress with false claims. Especially when discussing programming, people start to act irrational using arguments from back in 2022.

I think that you can easily address your concerns about this new technology (since we all are concerned about the future) but at the same time acknowledge how revolutionary it is.

scosman1mo ago

We won’t be doing it in 2 years. By then my side will have won!

ex-aws-dude1mo ago

Lets just be real its because a lot of programmer's ego is built on intelligence/being a coding wizard and this threatens that ego

If suddenly anyone can code we're not that special anymore.

throw-the-towel1mo ago· 6 in thread

See the longstanding debate on whether new math is "invented" or "discovered". Most mathematicians I knew thought it's discovered.

amelius1mo ago

This is like saying a sculpture always existed, the sculptor just had to remove the superfluous material.

Or like a musical octave has only 12 semitones, so all music is just a selection from a finite set that already existed.

Sure the insane computation we're throwing at this changes our perspective, but still there is an important distinction.

5 more replies

skybrian1mo ago

Any design already exists as a possibility, so it could be said to be both invented and discovered, depending on how you look at it.

3 more replies

soupspaces1mo ago

Regardless of which, both Newton and Leibniz imprint in their findings a 'voice' and understanding different from each other and that of an LLM (for now?)

protoplancton1mo ago

ASalazarMX1mo ago

Math is an abstraction of reality, it had to be invented, so more inventions or discoveries could be made within it.

2 more replies

atmosx1mo ago

...long standing indeed. It can be traced back to Plato's works.

1 more reply

lesostep1mo ago· 5 in thread

I am cautious about AI "discoveries" after Mythos paper.

What was the process of a writing a paper? Was the question asked by a mathematician? Was the paper right from a get-go or was there someone who pointed out mistakes?

How much attempts were made before solution was found?

andy12_1mo ago

> Was the question asked by a mathematician?

> Was the paper right from a get-go or was there someone who pointed out mistakes?

> How much attempts were made before solution was found?

[1] https://cdn.openai.com/pdf/74c24085-19b0-4534-9c90-465b8e29a...

0xDEAFBEAD1mo ago

>I am cautious about AI "discoveries" after Mythos paper.

Can you be more specific? I'm still under the impression that Mythos was a huge deal:

https://xcancel.com/hlntnr/status/2052479493801975987

1 more reply

rithdmc1mo ago

> I will eat my words if an AI oneshotted that one without any external help

like having a colleague peer review your paper, or bouncing ideas off a mentor before you write them down?

I agree there's a lot of AI marketing BS at the moment, but revising approaches based on feedback is a good thing.

1 more reply

crnkofe1mo ago

I'm also wondering about the process. What was the prompt, what they fed into the model, what it was trained on, etc. The article reads like a marketing post.

Nevertheless new maths is exciting and might lead to what I find slightly more interesting - new physics.

utopiah1mo ago

No need to be cautious.

So... that's also that, a piece of marketing to sell more of whatever their potential client can buy. It's not a piece of research. It's an ad. That's it.

lubujackson1mo ago· 5 in thread

For anyone using LLMs heavily for coding, this shouldn't be too surprising. It was just a matter of time.

daishi551mo ago

6 more replies

cubefox1mo ago

There is a long and interesting recent essay on that topic by a mathematician: https://davidbessis.substack.com/p/the-fall-of-the-theorem-e...

4 more replies

kamaal1mo ago

Future of code is pretty much a bunch of guys shepherding a bunch of agents to get them to your goal.

I don't see how math might not go that way as well.

anon2911mo ago

It is not only unsurprising ; it was always expected. There is no difference between programs and proofs. They are the same thing

mikert891mo ago

for now the LLMs will build off human understanding, eventually we will be left behind

Jeff_Brown1mo ago· 5 in thread

Can anyone find (or draw) a picture of the construction?

gibspaulding1mo ago

This only a proof that a field with more connections is possible, not what it looks like.

ninjha1mo ago

They only proved that one exists; computing the actual construction is non-obvious (the naive way to construct it is computationally infeasible).

pradn1mo ago

They have a "before" picture but not an "after"!

krackers1mo ago

See https://news.ycombinator.com/item?id=48233034

paulddraper1mo ago

Yeah, unfortunately, they just proved there existed a better solution, they didn't construct it.

(Though in some ways that's actually more impressive.)

mooreat1mo ago· 4 in thread

I think one interesting thing to point out is that the proof (disproof) was done by finding a counterexample of Erdős' original conjecture.

I agree with one of the mathematician's responses in the linked PDF that this is somewhat less interesting than proving the actual conjecture was true.

gus_massa1mo ago

Searching for a proof and disproof are sometimes not so different. In most cases, you nibble the borders to simplify the problem.

In counterexamples you can make guess and leaps and if it works it's fine. This is not possible for a proof.

On the other hand, once you have found a counterexample it's usual to hide the dead ends you discarded.

3 more replies

felipeerias1mo ago

One of the mathematicians in the video describes the process as:

> the AI has been able to explore all these possibilities much more comprehensibly, and doing that it found a path, it found a way to the solution.

Finding a counterexample of a mathematical conjecture strikes me as not that different from finding a vulnerability in a complex codebase.

stevefan19991mo ago

That's proof by contradiction: https://en.wikipedia.org/wiki/Reductio_ad_absurdum

1 more reply

davebren1mo ago

> I think that's just a matter of having them able to work on longer and longer time horizons.

2 more replies

zozbot2341mo ago· 4 in thread

The summarized chain of thought for this task (linked in the blogpost) is 125 pages. That's an insane scale of reasoning, quite akin to what Anthropic has been teasing with Mythos.

devttyeu1mo ago

That's here for anyone wondering - https://cdn.openai.com/pdf/1625eff6-5ac1-40d8-b1db-5d5cf925d...

2 more replies

estetlinus1mo ago

Today I generated the equivalent of two LOTR books just to fix three missing rows in my SQL models (and open a PR), so +1

1 more reply

Chamix1mo ago

stratos1231mo ago

AFAIK, extremely long CoT is fairly routine for those math-finetuned private models. Also, is Mythos unusually verbose compared to Opus?

recitedropper1mo ago· 4 in thread

This is impressive, no question.

We can't know, of course. But until these things are built in a non-clandestine manner, this question will always remain.

JacobAsmuth1mo ago

4 more replies

muhneesh1mo ago

This type of discourse is just inane and more reflective of the author's sensibilities than anything it claims.

Congrats to the OpenAI team for one of the most significant breakthrough discoveries in AI history.

2 more replies

Rover2221mo ago

Seems like a very tin-foil-hat-take to me

3 more replies

fergie1mo ago

Really? Any references to read more?

1 more reply

Quentak1mo ago· 3 in thread

_diyar1mo ago

Any answer to this question must also consider the current cost/token and its downward trajectory as algo and hardware advances drive down costs.

vessenes1mo ago

Good q. You can see that tantalizing graph where compute is displayed on a log axis and pass@1 goes up to like 50/60%. I’d love to know just how much compute is encoded on that axis.

I guess you can get some estimate from the excerpted CoT, but that CoT might be backed by quite a lot of parallel compute.

ApolloRising1mo ago

I would like to know this as well, including how long was it working on this problem?

dwroberts1mo ago· 2 in thread

lacewing1mo ago

Why?

3 more replies

OkWing991mo ago

Says in the papers. "...which was first mathematically generated in one shot by an internal model at OpenAI, and then expositionally refined through human interactions with Codex."

Doesn't really matter the prep-work, what they say is it's a one-shot result, achieved by AI. The blog doesn't claim it was done by a currently public Model.

1 more reply

Fraterkes1mo ago· 2 in thread

I guess if this stuff is going to make my employment more precarious, it’d be nice if it also makes some scientific breakthroughs. We’ll see

ausbah1mo ago

shame we won’t see any of these medical breakthroughs when we all lose our jobs and thus our healthcare

1 more reply

cubefox1mo ago

Breakthroughs in pure mathematics aren't scientific though. They say us nothing about the world, and they are not useful.

CGMthrowaway1mo ago· 2 in thread

How do you even get an LLM to try to solve one of these problems? When I ask it just comes back with the name of the problem and saying "it can't be done"

lovecg1mo ago

By making it think for 100+ pages https://cdn.openai.com/pdf/1625eff6-5ac1-40d8-b1db-5d5cf925d... Regular ChatGPT users don’t have a way to do that, this is something they do internally only.

edit: apparently that’s only the _condensed summary_ of the chain of thought.

2 more replies

KalMann1mo ago

Maybe you need to phrase it better. Like with a more specific direction of thinking.

dadrian1mo ago· 2 in thread

While the result is impressive, this blog post is extremely disappointing.

- It does not show an example of the new best solution, nor explain why they couldn't show an example (e.g. if the proof was not constructive)

- It's description of the new proof just cites some terms of art with no effort made to actually explain the result.

If this post were not on the OpenAI blog, I would assume it was slop. I understand advanced pure mathematics is complicated, but it is entirely possible to explain complicated topics to non-experts.

changoplatanero1mo ago

apparently the proof is not constructive in the sense of not giving an easy to compute recipe for generating a set of points that you can plot on a 2d plane

Al-Khwarizmi1mo ago

1 more reply

ccvannorman1mo ago· 1 in thread

I looked at all linked articles and could not find an example of the points (they show a square grid of points with n~=100 but no other ordering of points to show the more optimal layout(s)).

Is there anywhere an image example of a superior layout for example with n>={100,1000,10000}..? I would love to see it. I am imagining it would look somewhat like a sloppy pizza.

lg56891mo ago

One of the authors said in a reddit comment (and I hope I am summarizing accurately) that it's impossible to show a diagram as the smallest instance of the technique gives like 10^1000000 points.

1 more reply

endymi0n1mo ago· 1 in thread

To paraphrase Gwynne Shotwell: “Not too bad for just a large Markov chain, eh?”

rhubarbtree1mo ago

Erdos, or the model?

__0x011mo ago· 1 in thread

From the companion paper:

> The argument relies crucially on ideas that may, at least in retrospect, be attributed to Ellenberg-Venkatesh, Golod-Shafarevich, and Hajir-Maire-Ramakrishna.

Can someone please elaborate on this?

awdfeswavcra1mo ago

ks20481mo ago· 1 in thread

Timothy Gowers' tweet about this: "If you are a mathematician, then you may want to make sure you are sitting down before reading futher.".

woah.

missyougowers1mo ago

Unfortunately Gowers has taken Tao's lead on this one.

Gowers has one of my favourite video series about how he approaches a problem he is unfamiliar with: https://www.youtube.com/watch?v=byjhpzEoXFs

It is disheartening to see him jump into this GenAI puffery.

I hope these GenAI labs are paying Tao handsomely for legitimizing their slop, but more likely he's feeling pressure from his University to promote and work with these labs.

My guess is Gowers wants in on that action, or his University does.

Either way, it makes me sad. If its self motivated... even sadder.

5 more replies

purpleidea1mo ago· 1 in thread

You'd think a billion dollar company would be able to normalize the sound level on their video :/

llagerlof1mo ago

That will require AGI.

catigula1mo ago· 1 in thread

Anything at all outside of the absolute facts, even the faintest conjecture, feels completely outside of their reach.

dvfjsdhgfv1mo ago

Yeah, I remember it was one of my biggest disappointments with LLMs.

precision1k1mo ago· 1 in thread

noslenwerdna1mo ago

For those of us who care about the answers to these questions, rather than who gets credit for doing it, we will welcome any faster means of solving these problems.

1 more reply

throwaway20271mo ago

zone4111mo ago

ferris-booler1mo ago

zmmmmm1mo ago

isolli1mo ago

Question:

The conjecture was about an upper bound for the maximum number of pairs. It has been disproven.

Was the Erdos problem the conjecture itself, or was it about the actual maximum number of pairs? (In which case it will probably never be solved.)

The problem is defined in the narrow version here: https://www.erdosproblems.com/90

dwa35921mo ago

Few questions that the blog did not answer, if anyone knows that'll be great:

- Does anyone know if this was a 1 minute of inference or 1 month?

- How many times did the model say it was done disproving before it was found out that the model was wrong/hallucinating?

- One of the graphs say - the model produced the right answer almost half the times at the peak compute??? did i understand that right? what does peak compute mean here?

kevinwang1mo ago

Nitpicky/not important, but they say:

Since loglog(n) tends to infinity with n, the additional term in the exponent tends to 0, meaning these constructions achieve growth only slightly faster than linear.

1 more reply

famouswaffles1mo ago

Another entry in a growing list of the last couple months (interestingly mostly Open AI):

1. Erdos 1196, GPT-5.4 Pro - https://www.scientificamerican.com/article/amateur-armed-wit...

There are a couple of other Erdos wins, but this was the most impressive, prior to the thread in question. And it's completely unsupervised.

Solution - https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...

2. Single-minus gluon tree amplitudes are nonzero , GPT-5.2 https://openai.com/index/new-result-theoretical-physics/

3. Frontier Math Open Problem, GPT-5.4 Pro and others - https://epoch.ai/frontiermath/open-problems/ramsey-hypergrap...

4. GPT-5.5 Pro - https://gowers.wordpress.com/2026/05/08/a-recent-experience-...

5. Claude's Cycles, Claude Opus 4.6 - https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cyc...

analognoise1mo ago

What was discovered were numerous mistakes in the published literature on the subject. “New math! AI!” No, just mechanical application of rules, human mistakes.

There were things that were theorized, but couldn’t be exhaustively checked until computers were bigger.

Once again, a tool is applied, it has the AI label - its progress! But it isn’t something new. It’s just an LLM.

The result is more impressive than what was done with tables of integrals and SAINT in 1961, sure.

Apparently if you add a “temperature” knob to a text predictor, otherwise sane individuals piss themselves and call it new.

Then again I thought NFTs, crypto, and the Metaverse were stupid, so what do I know.

foota1mo ago

They should feed it the classification of finite simple groups and get it to simplify it/turn it more constructive.

Topology11mo ago

As someone starting grad school for pure mathematics, this has me both excited and nervous, but mainly the latter...

alansaber1mo ago

AI isn't going to supercharge science but I wouldn't be as dismissive as other posters here.

7 more replies

Kye1mo ago

agentultra1mo ago

taimurshasan1mo ago

I wonder how much this cost vs a Math Professor or a team of Math Professors.

2 more replies

globulus20231mo ago

In the article there is a diagram of the “square grid” arrangement that achieves approximately 2n points separated by unit distance.

Can anyone point me to a diagram of what the newly found solution looks like?

globulus20231mo ago

In the article there is a nice clear diagram of the “square grid” arrangement that was previously thought to be optimal.

Can anyone point me to a diagram of the newly found optimal arrangement?

phkahler1mo ago

I would have thought a triangular grid works better than a grid of squares. You get ~3n links vs ~2n for the square grid. Curious what the AI came up with.

3 more replies

num421mo ago

I am not surprised! The birth of computer science was rooted in the desire to automate mathematical discovery and proof writing.

armanj1mo ago

useless fact: there is no mention to "gpt" in this article. the ai is referred to as "An internal OpenAI model".

adt1mo ago

https://lifearchitect.ai/asi/

dev1ycan1mo ago

Wouldn't surprise me if they're just paying math geniuses to do math research and attribute it to AI models.

zuzululu1mo ago

This topic and discussion is out of my league what is the implication here ? LLMs aren't a dead end ?

yusufozkan1mo ago

3 more replies

SubiculumCode1mo ago

I wonder whether there will be progress in string theory from these kinds of applications of AI.

solomatov1mo ago

How central is it in the discrete geometry? Could anyone with the knowledge in the field reply?

4 more replies

empath751mo ago

Important note: this was not done with a special mathematics harness or specialized workflow.

2 more replies

pizzao1mo ago

Can someone explain to me what is their "prompting-scaffolding" to make it work ?

1 more reply

aussieguy12341mo ago

So we've got the proof, what are the practical applications of this?

1 more reply

sinuhe691mo ago

How did they jump from finding counter-examples (disproof) to a proof?

momo261mo ago

oscord1mo ago

Can it model a sustainable economy model, with human happiness and fulfilment indexes and planet preservation focus? Current capitalism and the red thing are so tired!

callamdelaney1mo ago

The only relevant question is, how much did it cost?

auggierose1mo ago

Which model did this? Is it available to the public?

_heimdall1mo ago

As this becomes more common it makes me wonder where the LLM ends and the harness begins.

anonymousDan1mo ago

Come back to me when it has solved P=NP.

iLoveOncall1mo ago

Absolutely no proof that any LLM actually found the result, and just a mention of an "internal model". Served to you by one of the biggest liars in the world.

Why would anyone believe this to be true even for a split second?

1 more reply

seydor1mo ago

can the AI please tell us what to do now that all knowledge work will become unemployment?

2 more replies

alsetmusic1mo ago

I find this hyperbolic, but ya gotta juice up the upcoming IPO. I hate that they took an interesting announcement and reminded me why I hate tech and our society at the end.

34228171mo ago

Nice. By the year 2100 200 Erdos problems will have been solved by AI. Let's build more data centers.

overgard1mo ago

neuroelectron1mo ago

somewhereoutth1mo ago

The real test would be if an LLM makes an important conjecture.

ai_fry_ur_brain1mo ago

Im convinved they target these pure math problems because math is very occulted to the masses, and therefor can use math "discoveries" as a way to make an LLM seem more impressive than it is.

Everything is a grift.

AlexToaniAI1mo ago

So nowadays. AI may use different field and get lots of break through that migh human can't done! That's nuts!

PeterStuer1mo ago

There's still parrots parroting 'stochastic parrot'.

arsan871mo ago

neato. can we do any thing with this new found knowledge or is this mathematical sports?

can we please put these ground breaking AIs to work on actual problems humans have?

1 more reply

DiogenesKynikos1mo ago

Calling all LLM skeptics. How did a "stochastic parrot" just disprove an Erdős conjecture that mathematicians couldn't figure out for decades?

bradleykingz1mo ago

ok. so what are the implications of for math

1 more reply

fromMars1mo ago

Seems rather depressing to me but maybe I am a Luddite.

1 more reply

ninjagoo1mo ago

> the closer the expertise you spent your whole life building is to being worthless.

> Along with all the rest of what humans find meaningful and fulfilling.

[1] https://news.ycombinator.com/item?id=48215122

brcmthrowaway1mo ago

End times are approaching

mrcwinn1mo ago

The back and forth in this discussion reveals to me we are sorting through a kind of philosophical debate about intelligence. That alone tells me LLMs are doing something novel.

voooduuuuu1mo ago

Ask an LLM to invent a new word and post it here. You will see that it simply combines words already in the training data.

10 more replies

atleastoptimal1mo ago

To all AI skeptics:

What is preventing AI from continuing to improve until it is absolutely better than humans at any mental task?

If we compare AI now vs 2022 the difference is outstandingly stark. Do you believe this improvement will just stop before it eclipses all humans in everything we care about?

8 more replies

cwmoore1mo ago

From the meandering and self-loving article:

reactordev1mo ago

1 more reply

j / k navigate · click thread line to collapse