GPT-5.2 derives a new result in theoretical physics (opens in new tab)

(openai.com)

574 pointsdavidbarker2mo ago401 comments

401 comments

The headline may make it seem like AI just discovered some new result in physics all on its own, but reading the post, humans started off trying to solve some problem, it got complex, GPT simplified it and found a solution with the simpler representation. It took 12 hours for GPT pro to do this. In my experience LLM’s can make new things when they are some linear combination of existing things but I haven’t been to get them to do something totally out of distribution yet from first principles.

CGMthrowaway2mo ago

This is the critical bit (paraphrasing):

Humans have worked out the amplitudes for integer n up to n = 6 by hand, obtaining very complicated expressions, which correspond to a “Feynman diagram expansion” whose complexity grows superexponentially in n. But no one has been able to greatly reduce the complexity of these expressions, providing much simpler forms. And from these base cases, no one was then able to spot a pattern and posit a formula valid for all n. GPT did that.

Basically, they used GPT to refactor a formula and then generalize it for all n. Then verified it themselves.

I think this was all already figured out in 1986 though: https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.56... see also https://en.wikipedia.org/wiki/MHV_amplitudes

godelski2mo ago

  > I think this was all already figured out in 1986 though

They cite that paper in the third paragraph...

  Naively, the n-gluon scattering amplitude involves order n! terms. Famously, for the special case of MHV (maximally helicity violating) tree amplitudes, Parke and Taylor [11] gave a simple and beautiful, closed-form, single-term expression for all n.

It also seems to be a main talking point.

I think this is a prime example of where it is easy to think something is solved when looking at things from a high level but making an erroneous conclusion due to lack of domain expertise. Classic "Reviewer 2" move. Though I'm not a domain expert and so if there was no novelty over Parke and Taylor I'm pretty sure this will get thrashed in review.

CGMthrowaway2mo ago

You're right. Parke & Taylor showed the simplest nonzero amplitudes have two minus helicities while one-minus amplitudes vanish (generically). This paper claims that vanishing theorem has a loophole - a new hidden sector exists and one-minus amplitudes are secretly there, but distributional

1 more reply

btown2mo ago

It bears repeating that modern LLMs are incredibly capable, and relentless, at solving problems that have a verification test suite. It seems like this problem did (at least for some finite subset of n)!

This result, by itself, does not generalize to open-ended problems, though, whether in business or in research in general. Discovering the specification to build is often the majority of the battle. LLMs aren't bad at this, per se, but they're nowhere near as reliably groundbreaking as they are on verifiable problems.

utopiah2mo ago

> modern LLMs are incredibly capable, and relentless, at solving problems that have a verification test suite.

Feel like it's a bit what I tried to expressed few weeks ago https://news.ycombinator.com/item?id=46791642 namely that we are just pouring computational resources at verifiable problems then claim that astonishingly sometimes it works. Sure LLMs even have a slight bias, namely they do rely on statistics so it's not purely brute force but still the approach is pretty much the same : throw stuff at the wall, see what sticks, once something finally does report it as grandiose and claim to be "intelligent".

2 more replies

QuercusMax2mo ago

Yes, this is where I just cannot imagine completely AI-driven software development of anything novel and complicated without extensive human input. I'm currently working in a space where none of our data models are particularly complex, but the trick is all in defining the rules for how things should work.

Our actual software implementation is usually pretty simple; often writing up the design spec takes significantly longer than building the software, because the software isn't the hard part - the requirements are. I suspect the same folks who are terrible at describing their problems are going to need help from expert folks who are somewhere between SWE, product manager, and interaction designer.

D-Machine2mo ago

Even more generally than verification, just being tied to a loss function that represent something we actually care about. E.g. compiler and test errors, LEAN verification in Aristotle, basic physics energy configs in AlphaFold, or win conditions in e.g. RL, such as in AlphaGo.

RLHF is an attempt to push LLMs pre-trained with a dopey reconstruction loss toward something we actually care about: imagine if we could find a pre-training criterion that actually cared about truth and/or plausibility in the first place!

1 more reply

lupsasca2mo ago

That paper from the 80s (which is cited in the new one) is about "MHV amplitudes" with two negative-helicity gluons, so "double-minus amplitudes". The main significance of this new paper is to point out that "single-minus amplitudes" which had previously been thought to vanish are actually nontrivial. Moreover, GPT-5.2 Pro computed a simple formula for the single-minus amplitudes that is the analogue of the Parke-Taylor formula for the double-minus "MHV" amplitudes.

woeirua2mo ago

You should probably email the authors if you think that's true. I highly doubt they didn't do a literature search first though...

emp173442mo ago

You should be more skeptical of marketing releases like this. This is an advertisement.

coldtea2mo ago

It's hard to get someone to do literature first when they get free publicity by not doing literature search and claiming some major AI assisted breakthrough...

Heck, it's hard to get authors to do literature search, period: never mind not thoroughly looking for prior art, even well known disgraced papers get citated continue to get possitive citations all the time...

godelski2mo ago

They also reference Parke and Taylor. Several times...

suuuuuuuu2mo ago

Don't underestimate the willingness of physicists to skimp on literature review.

1 more reply

helterskelter2mo ago

> But no one has been able to greatly reduce the complexity of these expressions, providing much simpler forms.

Slightly OT, but wasn't this supposed to be largely solved with amplituhedrons?

ericmay2mo ago

Still pretty awesome though, if you ask me.

fsloth2mo ago

I think even “non-intelligent” solver like Mathematica is cool - so hell yes, this is cool.

_aavaa_2mo ago

Big difference between “derives new result” and “reproduces something likely in its training dataset”.

nine_k2mo ago

Sounds somehow similar to the groundbreaking application of a computer to prove the 4 color theorem. Then the researchers wrote a program to find and formally prove the numerous particular cases. Here the computer finds a simplifying pattern.

torginus2mo ago

I'm not sure if GPTs ability goes beyond a formal math package's in this regard or its just its just way more convienient to ask ChatGPT rather than using these software.

randomtoast2mo ago

> but I haven’t been to get them to do something totally out of distribution yet from first principles

Can humans actually do that? Sometimes it appears as if we have made a completely new discovery. However, if you look more closely, you will find that many events and developments led up to this breakthrough, and that it is actually an improvement on something that already existed. We are always building on the shoulders of giants.

davorak2mo ago

> Can humans actually do that?

From my reading yes, but I think I am likely reading the statement differently than you are.

> from first principles

Doing things from first principles is a known strategy, so is guess and check, brute force search, and so on.

For an llm to follow a first principles strategy I would expect it to take in a body of research, come up with some first principles or guess at them, then iteratively construct and tower of reasonings/findings/experiments.

Constructing a solid tower is where things are currently improving for existing models in my mind, but when I try openai or anthropic chat interface neither do a good job for long, not independently at least.

Humans also often have a hard time with this in general it is not a skill that everyone has and I think you can be a successful scientist without ever heavily developing first principles problem solving.

dsign2mo ago

"Constructing a solid tower" from first principles is already super-human level. Sure, you can theorize a tower (sans the "solid") from first principles; there's a software architect at my job that does it every day. But the "solid" bit is where things get tricky, because "solid" implies "firm" and "well anchored", and that implies experimental grounds, experimental verification all the way, and final measurable impact. And I'm not even talking particle physics or software engineering; even folding a piece of paper can give you surprising mismatches between theory and results.

Even the realm of pure mathematics and elegant physic theories, where you are supposed to take a set of axioms ("first principles") and build something with it, has cautionary tales such as the Russel paradox or the non-measure of Feymann path integrals, and let's not talk about string theory.

samrus2mo ago

Yes. Thats how all advancement in human knowledge happened. Small and incremental forays out of our training distribution.

These have been identified as various things. Eureka moments, strokes of genius, out of the box thinking, lateral thinking.

LLMs have not shown to be capable of this. They might be in the future, but they havent yet

dotancohen2mo ago

Relativity comes to mind.

You could nitpick a rebuttal, but no matter how many people you give credit, general relativity was a completely novel idea when it was proposed. I'd argue for special relatively as well.

Paracompact2mo ago

I am not a scientific historian, or even a physicist, but IMO relativity has a weak case for being a completely novel discovery. Critique of absolute time and space of Newtonian physics was already well underway, and much of the methodology for exploring this relativity (by way of gyroscopes, inertial reference frames, and synchronized mechanical clocks) were already in parlance. Many of the phenomena that relativity would later explain under a consistent framework already had independent quasi-explanations hinting at the more universal theory. Poincare probably came the closest to unifying everything before Einstein:

> In 1902, Henri Poincaré published a collection of essays titled Science and Hypothesis, which included: detailed philosophical discussions on the relativity of space and time; the conventionality of distant simultaneity; the conjecture that a violation of the relativity principle can never be detected; the possible non-existence of the aether, together with some arguments supporting the aether; and many remarks on non-Euclidean vs. Euclidean geometry.

https://en.wikipedia.org/wiki/History_of_special_relativity

Now, if I had to pick a major idea that seemed to drop fully-formed from the mind of a genius with little precedent to have guided him, I might personally point to Galois theory (https://en.wikipedia.org/wiki/Galois_theory). (Ironically, though, I'm not as familiar with the mathematical history of that time and I may be totally wrong!)

7 more replies

DonaldFisk2mo ago

Agreed.

General relativity was a completely novel idea. Einstein took a purely mathematical object (now known as the Einstein tensor), and realized that since its coveriant derivative was zero, it could be equated (apart fron a constant factor) to a conserved physical object, the energy momentum tensor (except for a constant factor). It didn't just fall out of Riemannian geometry and what was known about physics at the time.

Special relativity was the work of several scientists as well as Einstein, but it was also a completely novel idea - just not the idea of one person working alone.

I don't know why anyone disputes that people can sometimes come up with completely novel ideas out of the blue. This is how science moves forward. It's very easy to look back on a breakthrough and think it looks obvious (because you know the trick that was used), but it's important to remember that the discoverer didn't have the benefit of hindsight that you have.

johnfn2mo ago

Even if I grant you that, surely we’ve moved the goal posts a bit if we’re saying the only thing we can think of that AI can’t do is the life’s work of a man who’s last name is literally synonymous with genius.

poplarsol2mo ago

That's not exactly true. Lorentz contraction is a clear antecedent to special relativity.

2 more replies

lamontcg2mo ago

Not really. Pretty sure I read recently that Newton appreciated that his theory was non-local and didn't like what Einstein later called "spooky action at a distance". The Lorentz transform was also known from 1887. Time dilation was understood from 1900. Poincaré figured out in 1905 that it was a mathematical group. Einstein put a bow on it all by figuring out that you could derive it from the principle of relativity and keeping the speed of light constant in all inertial reference frames.

I'm not sure about GR, but I know that it is built on the foundations of differential geometry, which Einstein definitely didn't invent (I think that's the source of his "I assure you whatever your difficulties in mathematics are, that mine are much greater" quote because he was struggling to understand Hilbert's math).

And really Cauchy, Hilbert, and those kinds of mathematicians I'd put above Einstein in building entirely new worlds of mathematics...

2 more replies

CooCooCaCha2mo ago

Depends on what you think is valid.

The process you’re describing is humans extending our collective distribution through a series of smaller steps. That’s what the “shoulders of giants” means. The result is we are able to do things further and further outside the initial distribution.

So it depends on if you’re comparing individual steps or just the starting/ending distributions.

tjr2mo ago

Go enough shoulders down, and someone had to have been the first giant.

nextaccountic2mo ago

Probably not homo sapiens.. other hominids older than us developed a lot of technology

Kuxe2mo ago

A discovery by a giant is in some sense a new base vector in the space of discoveries. The interesting question is if a statistical machine can only perform a linear combination in the space of discoveries, or if a statistical machine can discover a new base vector in the space of discoveries.. whatever that is.

1 more reply

pram2mo ago

Pythagoras is the turtle.

1 more reply

utopiah2mo ago

Arguably it's precisely a paradigm shift. Continuing whatever worked until now is within the paradigm, our current theories and tools works, we find few problems that don't fit but that's fine the rest is still progress, we keep on hitting more problems or those few pesky unsolved problems actually appear to be important. We then go back to the theory and its foundations and finally challenge them. We break from the old paradigm and come up with new theories and tools because the first principles are now better understood and we iterate.

So that's actually 2 different regimes on how to proceed. Both are useful but arguably breaking off of the current paradigm is much harder and thus rare.

D-Machine2mo ago

The tricky part is that LLMs aren't just spewing outputs from the distribution (or "near" learned manifolds), but also extrapolating / interpolating (depending on how much you care about the semantics of these terms https://arxiv.org/abs/2110.09485).

There are genuine creative insights that come from connecting two known semantic spaces in a way that wasn't obvious before (e.g, novel isomorphism). It is very conceivable that LLMs could make this kind of connection, but we haven't really seen a dramatic form of this yet. This kind of connection can lead to deep, non-trivial insights, but whether or not it is "out-of-distribution" is harder to answer in this case.

tshaddox2mo ago

I mean, there’s just no way you can take the set of publicly known ideas from all human civilizations, say, 5,000 years ago, and say that all the ideas we have now were “in the distribution” then. New ideas actually have to be created.

godelski2mo ago

  > Can humans actually do that?

Yes

Seriously, think about it for a second...

If that were true then science should have accelerated a lot faster. Science would have happened differently and researchers would have optimized to trying to ingest as many papers as they can.

Dig deep into things and you'll find that there are often leaps of faith that need to be made. Guesses, hunches, and outright conjectures. Remember, there are paradigm shifts that happen. There are plenty of things in physics (including classical) that cannot be determined from observation alone. Or more accurately, cannot be differentiated from alternative hypotheses through observation alone.

I think the problem is when teaching science we generally teach it very linearly. As if things easily follow. But in reality there is generally constant iterative improvements but they more look like a plateau, then there are these leaps. They happen for a variety of reasons but no paradigm shift would be contentious if it was obvious and clearly in distribution. It would always be met with the same response that typical iterative improvements are met with "well that's obvious, is this even novel enough to be published? Everybody already knew this" (hell, look at the response to the top comment and my reply... that's classic "Reviewer #2" behavior). If it was always in distribution progress would be nearly frictionless. Again, with history in how we teach science we make an error in teaching things like Galileo, as if The Church was the only opposition. There were many scientists that objected, and on reasonable grounds. It is also a problem we continually make in how we view the world. If you're sticking with "it works" you'll end up with a geocentric model rather than a heliocentric model. It is true that the geocentric model had limits but so did the original heliocentric model and that's the reason it took time to be adopted.

By viewing things at too high of a level we often fool ourselves. While I'm criticizing how we teach I'll also admit it is a tough thing to balance. It is difficult to get nuanced and in teaching we must be time effective and cover a lot of material. But I think it is important to teach the history of science so that people better understand how it actually evolves and how discoveries were actually made. Without that it is hard to learn how to actually do those things yourself, and this is a frequent problem faced by many who enter PhD programs (and beyond).

  > We are always building on the shoulders of giants.

And it still is. You can still lean on others while presenting things that are highly novel. These are not in disagreement.

It's probably worth reading The Unreasonable Effectiveness of Mathematics in the Natural Sciences. It might seem obvious now but read carefully. If you truly think it is obvious that you can sit in a room armed with only pen and paper and make accurate predictions about the world, you have fooled yourself. You have not questioned why this is true. You have not questioned when this actually became true. You have not questioned how this could be true.

https://www.hep.upenn.edu/~johnda/Papers/wignerUnreasonableE...

  You are greater than the sum of your parts

stouset2mo ago

When chess engines were first developed, they were strictly worse than the best humans. After many years of development, they became helpful to even the best humans even though they were still beatable (1985–1997). Eventually they caught up and surpassed humans but the combination of human and computer was better than either alone (~1997–2007). Since then, humans have been more or less obsoleted in the game of chess.

Five years ago we were at Stage 1 with LLMs with regard to knowledge work. A few years later we hit Stage 2. We are currently somewhere between Stage 2 and Stage 3 for an extremely high percentage of knowledge work. Stage 4 will come, and I would wager it's sooner rather than later.

MITSardine2mo ago

There's a major difference between chess and scientific research: setting the objectives is itself part of the work.

In chess, there's a clear goal: beat the game according to this set of unambiguous rules.

In science, the goals are much more diffuse, and setting those in the first place is what makes a scientist more or less successful, not so much technical ability. It's a very hierarchical field where permanent researchers direct staff (postdocs, research scientists/engineers), direct grad students. And it's at the bottom of the pyramid where the technical ability is the most relevant/rewarded.

Research is very much a social game, and I think replacing it with something run by LLMs (or other automatic process) is much more than a technical challenge.

bluecalm2mo ago

The evolution was also interesting: first the engines were amazing tactically but pretty bad strategically so humans could guide them. With new NN based engines they were amazing strategically but they sucked tactically (first versions of Leela Chess Zero). Today they closed the gap and are amazing at both strategy and tactics and there is nothing humans can contribute anymore - all that is left is to just watch and learn.

1 more reply

TGower2mo ago

With a chess engine, you could ask any practitioner in the 90's what it would take to achieve "Stage 4" and they could estimate it quite accurately as a function of FLOPs and memory bandwidth. It's worth keeping in mind just how little we understand about LLM capability scaling. Ask 10 different AI researchers when we will get to Stage 4 for something like programming and you'll get wild guesses or an honest "we don't know".

stouset2mo ago

That is not what happened with chess engines. We didn’t just throw better hardware at it, we found new algorithms, improved the accuracy and performance of our position evaluation functions, discovered more efficient data structures, etc.

People have been downplaying LLMs since the first AI-generated buzzword garbage scientific paper made its way past peer review and into publication. And yet they keep getting better and better to the point where people are quite literally building projects with shockingly little human supervision.

By all means, keep betting against them.

baq2mo ago

Chess grandmasters are living proof that it’s possible to reach grandmaster level in chess on 20W of compute. We’ve got orders of magnitude of optimizations to discover in LLMs and/or future architectures, both software and hardware and with the amount of progress we’ve got basically every month those ten people will answer ‘we don’t know, but it won’t be too long’. Of course they may be wrong, but the trend line is clear; Moore’s law faced similar issues and they were successively overcome for half a century.

IOW respect the trend line.

blt2mo ago

And their predictions about Go were wrong, because they thought the algorithm would forever be α-β pruning with a weak value heuristic

NitpickLawyer2mo ago

> With a chess engine, you could ask any practitioner in the 90's what it would take to achieve "Stage 4" and they could estimate it quite accurately as a function of FLOPs and memory bandwidth.

And the same practitioners said right after deep blue that go is NEVER gonna happen. Too large. The search space is just not computable. We'll never do it. And yeeeet...

guluarte2mo ago

so we are going back to physical labor then

empath752mo ago

We are already at stage 3 for software development and arguably step 4

zarzavat2mo ago

We are at level 2.5 for software development, IMO. There is a clear skill gap between experienced humans and LLMs when it comes to writing maintainable, robust, concise and performant code and balancing those concerns.

The LLMs are very fast but the code they generate is low quality. Their comprehension of the code is usually good but sometimes they have a weightfart and miss some obvious detail and need to be put on the right path again. This makes them good for non-experienced humans who want to write code and for experienced humans who want to save time on easy tasks.

1 more reply

bpodgursky2mo ago

I don't want to be rude but like, maybe you should pre-register some statement like "LLMs will not be able to do X" in some concrete domain, because I suspect your goalposts are shifting without you noticing.

We're talking about significant contributions to theoretical physics. You can nitpick but honestly go back to your expectations 4 years ago and think — would I be pretty surprised and impressed if an AI could do this? The answer is obviously yes, I don't really care whether you have a selective memory of that time.

RandomLensman2mo ago

I don't know enought about theoretical physics: what makes it a significant contribution there?

terminalbraid2mo ago

It's a nontrivial calculation valid for a class of forces (e.g. QCD) and apparently a serious simplification to a specific calculation that hadn't been completed before. But for what it's worth, I spent a good part of my physics career working in nucleon structure and have not run across the term "single minus amplitudes" in my memory. That doesn't necessarily mean much as there's a very broad space work like this takes place in and some of it gets extremely arcane and technical.

One way I gauge the significance of a theory paper are the measured quantities and physical processes it would contribute to. I see none discussed here which should tell you how deep into math it is. I personally would not have stopped to read it on my arxiv catch-up

https://arxiv.org/list/hep-th/new

Maybe to characterize it better, physicists were not holding their breath waiting for this to get done.

1 more reply

epolanski2mo ago

Not every contribution has immediate impact.

1 more reply

outlace2mo ago

I never said LLMs will not be able to do X. I gave my summary of the article and my anecdotal experiences with LLMs. I have no LLM ideology. We will see what tomorrow brings.

nozzlegear2mo ago

> We're talking about significant contributions to theoretical physics.

Whoever wrote the prompts and guided ChatGPT made significant contributions to theoretical physics. ChatGPT is just a tool they used to get there. I'm sure AI-bloviators and pelican bike-enjoyers are all quite impressed, but the humans should be getting the research credit for using their tools correctly. Let's not pretend the calculator doing its job as a calculator at the behest of the researcher is actually a researcher as well.

famouswaffles2mo ago

If this worked for 12 hours to derive the simplified formula along with its proof then it guided itself and made significant contributions by any useful definition of the word, hence Open AI having an author credit.

1 more reply

bpodgursky2mo ago

If a helicopter drops someone off on the top of Mount Everest, it's reasonable to say that the helicopter did the work and is not just a tool they used to hike up the mountain.

1 more reply

emil-lp2mo ago

"GPT did this". Authored by Guevara (Institute for Advanced Study), Lupsasca (Vanderbilt University), Skinner (University of Cambridge), and Strominger (Harvard University).

Probably not something that the average GI Joe would be able to prompt their way to...

I am skeptical until they show the chat log leading up to the conjecture and proof.

Sharlin2mo ago

I'm a big LLM sceptic but that's… moving the goalposts a little too far. How could an average Joe even understand the conjecture enough to write the initial prompt? Or do you mean that experts would give him the prompt to copy-paste, and hope that the proverbial monkey can come up with a Henry V? At the very least posit someone like a grad student in particle physics as the human user.

buttered_toast2mo ago

I would interpret it as implying that the result was due to a lot more hand-holding that what is let on.

Was the initial conjecture based on leading info from the other authors or was it simply the authors presenting all information and asking for a conjecture?

Did the authors know that there was a simpler means of expressing the conjecture and lead GPT to its conclusion, or did it spontaneously do so on its own after seeing the hand-written expressions.

These aren't my personal views, but there is some handwaving about the process in such a way that reads as if this was all spontaneous involvement on GPTs end.

But regardless, a result is a result so I'm content with it.

1 more reply

lamontcg2mo ago

That's kinda the whole point.

SpaceX can use an optimization algorithm to hoverslam a rocket booster, but the optimization algorithm didn't really figure it out on its own.

The optimization algorithm was used by human experts to solve the problem.

1 more reply

slopusila2mo ago

hey, GPT, solve this tough conjecture I've read about on Quanta. make no mistakes

1 more reply

jmalicki2mo ago

"Grad Student did this". Co-authored by <Famous advisor 1>, <Famous advisor 2>, <Famous advisor 3>.

Is this so different?

sejje2mo ago

The Average Joe reads at an 8th grade level. 21% are illiterate in the US.

LLMs surpassed the average human a long time ago IMO. When LLMs fail to measure up to humans, it's that they fail to measure up against human experts in a given field, not the Average Joe.

We are surrounded by NPCs.

famouswaffles2mo ago

The paper has all those prominent institutions who acknowledge the contribution so realistically, why would you be skeptical ?

kristopolous2mo ago

they probably also acknowledge pytorch, numpy, R ... but we don't attribute those tools as the agent who did the work.

I know we've been primed by sci-fi movies and comic books, but like pytorch, gpt-5.2 is just a piece of software running on a computer instrumented by humans.

2 more replies

Refreeze52242mo ago

Their point is, would you be able to prompt your way to this result? No. Already trained physicists working at world-leading institutions could. So what progress have we really made here?

1 more reply

slibhb2mo ago

> In my experience LLM’s can make new things when they are some linear combination of existing things but I haven’t been to get them to do something totally out of distribution yet from first principles.

What's the distinction between "first principles" and "existing things"?

I'm sympathetic to the idea that LLMs can't produce path-breaking results, but I think that's true only for a strict definition of path-breaking (that is quite rare for humnans too).

hellisad2mo ago

Hmm feels a bit trivializing, we don't know exactly how difficult it was to come up with the generic set of equations mentioned from the human starting point.

I can claim some knowledge of physics from my degree, typically the easy part is coming up with complex dirty equations that work under special conditions, the hard part is the simplification into something elegant, 'natural' and general.

Also "LLM’s can make new things when they are some linear combination of existing things"

Doesn't really mean much, what is a linear combination of things you first have to define precisely what a thing is?

epolanski2mo ago

Serious questions, I often hear about this "let the LLM cook for hours" but how do you do that in practice and how does it manages its own context? How doesn't it get lost at all after so many tokens?

lovecg2mo ago

I’m guessing, would love someone who has first hand knowledge to comment. But my guess is it’s some combination of trying many different approaches in parallel (each in a fresh context), then picking the one that works, and splitting up the task into sequential steps, where the output of one step is condensed and is used as an input to the next step (with possibly human steering between steps)

javier1234543212mo ago

From what I've seen is a process of compacting the session once it reaches some limit, which basically means summarizing all the previous work and feeding it as the initial prompt for the next session.

8note2mo ago

the annoying part is that with tool calls, a lot of those hours is time spent on netowrk round trips.

over long periods of time, checklists are the biggest thing, so the LLM can track whats already done and whats left. after a compact, it can pull the relevant stuff back up and make progress.

having some level or hierarchy is also useful - requirements, high level designs, low level designs, etc

anon2912mo ago

Very very few human individuals are capable of making new things that are not a linear combination of existing things. Even such things as special relativity were an application of two previous ideas. All of special relativity is deriveable from the principles of relative motion (known into antiquity) and the constant speed of light (which was known to Einstein). From there it is a straightforwards application of the Pythagorean theorem to realize there is a contradiction and the lorentz factor falls out naturally via basic algebra.

tedd4u2mo ago

What does a 12-hour solution cost an OpenAI customer?

int_19h2mo ago

$200/month would cover many such sessions every month.

The real question is, what does it cost OpenAI? I'm pretty sure both their plans are well below cost, at least for users who max them out (and if you pay $200 for something then you'll probably do that!). How long before the money runs out? Can they get it cheap enough to be profitable at this price level, or is this going to be "get them addicted then jack it up" kind of strategy?

revlsas2mo ago

No because open source models are close behind

Compute costs will fall drastically for existing models

But it's likely that frontier models of the future won't be released to the public at all, because they'll be too good

1 more reply

sathish3162mo ago

> I haven’t been to get them to do something totally out of distribution yet from first principles.

Agree with this. I’ve been trying to make LLMs come up with creative and unique word games like Wordle and Uncrossy (uncrossy.com), but so far GPT-5.2 has been disappointing. Comparatively, Opus 4.5 has been doing better on this.

But it’s good to know that it’s breaking new ground in Theoretical Physics!

FranklinJabar2mo ago

Surely higher level math is just linear combinations of the syntax and implications of lower level math. LLMs are taught syntax of basically all existing math notation, I assume. Much of math is, after all, just linguistic manipulation and detection of contradiction in said language with a more formal, a priori language.

MITSardine2mo ago

LLMs can write theorems, but can they come up with meaningful definitions?

FranklinJabar2mo ago

I intended to imply this with "detection of contradiction". Coherence seems to me to be the only a priori meaning. Most of the meaning of "meaning" seems to me to be a posteriori. After all, what is the point of an a priori floating signifier?

1 more reply

acchow2mo ago

> In my experience LLM’s can make new things when they are some linear combination of existing things

It seems to me that all “new ideas” are basically linear combinations of existing things with exceeding rare exceptions…

Maybe Godel’s Incompleteness?

Darwinian evolution?

General Relativity?

Buddhist non-duality?

malshe2mo ago

My physics professor once claimed that imagination is just mental manipulation of past experiences. I never thought it was true for human beings but for LLMs it makes perfect sense.

zaphirplane2mo ago

I must be a Luddite, how do you have a model working for 12 hours on a problem. Mine is ready with an answer and always interrupts to ask confirmation or show answer

arjie2mo ago

That's on the harness - the device actually sending the prompt to the model. You can write a different harness that feeds the problem back in for however long you want. Ask Claude Code or Codex to build it for you in as minimal a fashion as possible and you'll see that a naïve version is not particularly more complex than `while true; do prompt $file >> file; done` (though it's not that precisely, obviously).

DeathArrow2mo ago

>LLM’s can make new things when they are some linear combination of existing things

Aren't most new things linear combinations of existing things (up to a point)?

waynesonfire2mo ago

> It took 12 hours for GPT pro to do this

Thanks for the summary; but this is a huge hand-wave. was GPT Pro just spinning for 12 hours and returend 42?!

Sparkyte2mo ago

AI cough LLMs don't discover things they simply surface information that already existed.

slibhb2mo ago

You're assuming there aren't "new things" latent inside currently existing information. That's definitely false, particulary for math/physics.

But it's worth thinking more about this. What gives humans the ability to discover "new things"? I would say it's due to our interaction with the universe via our senses, and not due to some special powers intrinsic to our brains that LLMs lack. And the thing is, we can feed novel measurements to LLMs (or, eventually, hook them up to camera feeds to "give them senses")

Sparkyte2mo ago

No it isn't false. If it is new it is novel, novel because it is known to some degree and two other abstracted known things prove the third. Just pattern matching connecting dots.

1 more reply

ctoth2mo ago

In my experience humans can make new things when they are some linear combination of existing things but I haven’t been able to get them to do something totally out of distribution yet from first principles[0].

[0]: https://slatestarcodex.com/2019/02/19/gpt-2-as-step-toward-g...

bottlepalm2mo ago

Is every new thing not just combinations of existing things? What does out of distribution even mean? What advancement has ever made that there wasn’t a lead up of prior work to it? Is there some fundamental thing that prevents AI from recombining ideas and testing theories?

outlace2mo ago

For example, ever since the first GPT 4 I’ve tried to get LLM’s to build me a specific type of heart simulation that to my knowledge does not exist anywhere on the public internet (otherwise I wouldn’t try to build it myself) and even up to GPT 5.3 it still cannot do it.

But I’ve successfully made it build me a great Poker training app, a specific form that also didn’t exist, but the ingredients are well represented on the internet.

And I’m not trying to imply AI is inherently incapable, it’s just an empirical (and anecdotal) observation for me. Maybe tomorrow it’ll figure it out. I have no dogmatic ideology on the matter.

fpgaminer2mo ago

> Is every new thing not just combinations of existing things?

If all ideas are recombinations of old ideas, where did the first ideas come from? And wouldn't the complexity of ideas be thus limited to the combined complexity of the "seed" ideas?

I think it's more fair to say that recombining ideas is an efficient way to quickly explore a very complex, hyperdimensional space. In some cases that's enough to land on new, useful ideas, but not always. A) the new, useful idea might be _near_ the area you land on, but not exactly at. B) there are whole classes of new, useful ideas that cannot be reached by any combination of existing "idea vectors".

Therefore there is still the necessity to explore the space manually, even if you're using these idea vectors to give you starting points to explore from.

All this to say: Every new thing is a combination of existing things + sweat and tears.

The question everyone has is, are current LLMs capable of the latter component. Historically the answer is _no_, because they had no real capacity to iterate. Without iteration you cannot explore. But now that they can reliably iterate, and to some extent plan their iterations, we are starting to see their first meaningful, fledgling attempts at the "sweat and tears" part of building new ideas.

drdeca2mo ago

Well, what exactly an “idea” is might be a little unclear, but I don’t think it clear that the complexity of ideas that result from combining previously obtained ideas would be bounded by the complexity of the ideas they are combinations of.

Any countable group is a quotient of a subgroup of the free group on two elements, iirc.

There’s also the concept of “semantic primes”. Here is a not-quite correct oversimplification of the idea: Suppose you go through the dictionary and one word at a time pick a word whose definition includes only other words that are still in the dictionary, and removing them. You can also rephrase definitions before doing this, as long as it keeps the same meaning. Suppose you do this with the goal of leaving as few words in it as you can. In the end, you should have a small cluster of a bit over 100 words, in terms of which all the other words you removed can be indirectly defined. (The idea of semantic primes also says that there is such a minimal set which translates essentially directly* between different natural languages.)

I don’t think that says that words for complicated ideas aren’t like, more complicated?

djeastm2mo ago

>If all ideas are recombinations of old ideas, where did the first ideas come from?

Ideas seem to just be our abstractions of neural impulses from deep in evolution.

red75prime2mo ago

"Sweat and tears" -> exploration and the training signal for reinforcement learning.

D-Machine2mo ago

> What does out of distribution even mean?

There are in fact ways to directly quantify this, if you are training e.g. a self-supervised anomaly-detection model.

Even with modern models not trained in that manner, looking at e.g. cosine distances of embeddings of "novel" outputs could conceivably provide objective evidence for "out-of-distribution" results. Generally, the embeddings of out-of-distribution outputs will have a large cosine (or even Euclidean) distance from the typical embedding(s). Just, most "out-of-distribution" outputs will be nonsense / junk, so, searching for weird outputs isn't really helpful, in general, if your goal is useful creativity.

amelius2mo ago

Just wait until LLMs are fast and cheap enough to be run in a breadth first search kind of way, with "fuzzy" pruning.

bamboozled2mo ago

All you have to do is see "openai.com" in the submission URL to know it's bullshit.

mirsadm2mo ago

My issue with any of these claims is the lack of proof. Just share the chat and now it got to the discovery. I'll believe it when I can see it for myself at this point. It's too easy to make all sorts of claims without proof these days. Elon Musk makes them all the time.

square_usual2mo ago

It's interesting to me that whenever a new breakthrough in AI use comes up, there's always a flood of people who come in to handwave away why this isn't actually a win for LLMs. Like with the novel solutions GPT 5.2 has been able to find for erdos problems - many users here (even in this very thread!) think they know more about this than Fields medalist Terence Tao, who maintains this list showing that, yes, LLMs have driven these proofs: https://github.com/teorth/erdosproblems/wiki/AI-contribution...

loire2802mo ago

It's easy to fall into a negative mindset when there are legions of pointy haired bosses and bandwagoning CEOs who (wrongly) point at breakthroughs like this as justification for AI mandates or layoffs.

threethirtytwo2mo ago

I think it's more insidious then this.

It's easy to fall into a negative mindset because the justification is real and what we see is just the beginning.

Obviously we are not at a point where developers aren't needed. But One developer can do more. And that is a legitimate reason to higher less developers.

The impending reality of the upward moving trendline is that AI becomes so capable that it can replace the majority of developers. That future is so horrifying that people need to scaffold logic to unjustifiy it.

anon848736282mo ago

What does "pointy haired" mean? (Presumably not literally?)

FeteCommuniste2mo ago

The "pointy-haired boss" was a character in the Dilbert comics, an archetypical know-nothing manager who spews jargon, jumps on trends, and takes credit for ideas that aren't his.

brokencode2mo ago

Crazy that an honest question like this gets downvoted.

I honestly think the downvote button is pretty trash for online communities. It kills diversity of thought and discussion and leaves you with an echo chamber.

If you disagree with or dislike something, leave a response. Express your view. Save the downvotes for racism, calls for violence, etc.

1 more reply

dakolli2mo ago

Yes, all of these stories, and frequent model releases are just intended to psyop "decision makers" into validating their longstanding belief that the labour shouldn't be as big of a line item in a companies expenses, and perhaps can be removed altogether.. They can finally go back to the good old days of having slaves (in the form of "agentic" bots), they yearn to own slaves again.

CEOs/decision makers would rather give all their labour budget to tokens if they could just to validate this belief. They are bitter that anyone from a lower class could hold any bargaining chips, and thus any influence over them. It has nothing to do with saving money, they would gladly pay the exact same engineering budget to Anthropic for tokens (just like the ruling class in times past would gladly pay for slaves) if it can patch that bitterness they have for the working class's influence over them.

The inference companies (who are also from this same class of people) know this, and are exploiting this desire. They know if they create the idea that AI progress is at an unstoppable velocity decision makers will begin handing them their engineering budgets. These things don't even have to work well, they just need to be perceived as effective, or soon to be for decision makers to start laying people off.

I suspect this is going to backfire on them in one of two ways.

1. French Revolution V2, they all get their heads cutoff in 15 years, or an early retirement on a concrete floor.

2. Many decisions makers will make fools of themselves, destroy their businesses and come begging to the working class for our labor, giving the working class more bargaining chips in the process.

Either outcome is going to be painful for everyone, lets hope people wake up before we push this dumb experiment too far.

janalsncm2mo ago

I’m reminded of Dan Wang’s commentary on US-China relations:

> Competition will be dynamic because people have agency. The country that is ahead at any given moment will commit mistakes driven by overconfidence, while the country that is behind will feel the crack of the whip to reform. … That drive will mean that competition will go on for years and decades.

https://danwang.co/ (2025 Annual letter)

The future is not predetermined by trends today. So it’s entirely possible that the dinosaur companies of today can’t figure out how to automate effectively, but get outcompeted by a nimble team of engineers using these tools tomorrow. As a concrete example, a lot of SaaS companies like Salesforce are at risk of this.

1 more reply

lovecg2mo ago

Let’s have some compassion, a lot of people are freaking out about their careers now and defense mechanisms are kicking in. It’s hard for a lot of people to say “actually yeah this thing can do most of my work now, and barrier of entry dropped to the ground”.

Toutouxc2mo ago

I am constantly seeing this thing do most of my work (which is good actually, I don't enjoy typing code), but requiring my constant supervision and frequent intervention and always trying to sneak in subtle bugs or weird architectural decisions that, I feel with every bone in my body, would bite me in the ass later. I see JS developers with little experience and zero CS or SWE education rave about how LLMs are so much better than us in every way, when the hardest thing they've ever written was bubble sort. I'm not even freaking about my career, I'm freaking about how much today's "almost good" LLMs can empower incompetence and how much damage that could cause to systems that I either use or work on.

kilroy1232mo ago

I agree with you on all of it.

But _what if_ they work out all of that in the next 2 years and it stops needing constant supervision and intervention? Then what?

4 more replies

nprateem2mo ago

Yes and look how far we've come in 4 years. If programming has another 4 that's all it has.

I'm just not sure who will end up employed. The near state is obviously jira driven development where agents just pick up tasks from jira, etc. But will that mean the PMs go and we have a technical PM, or will we be the ones binned? Probably for most SMEs it'll just be maybe 1 PM and 2 or so technical PMs churning out tickets.

But whatever. It's the trajectory you should be looking at.

threethirtytwo2mo ago

Have you ever thought about the fact that 2 years ago AI wasn't even good enough to write code. Now it's good enough.

Right now you state the current problem is: "requiring my constant supervision and frequent intervention and always trying to sneak in subtle bugs or weird architectural decisions"

But in 2 years that could be gone too, given the objective and literal trendline. So I actually don't see how you can hold this opinion: "I'm not even freaking about my career, I'm freaking about how much today's "almost good" LLMs can empower incompetence and how much damage that could cause to systems that I either use or work on." when all logic points away from it.

We need to be worried, LLMs are only getting better.

1 more reply

threethirtytwo2mo ago

I'm all for this. But it's the delusion and denialism of people not wanting to face reality.

Like I have compassion, but I can't healthily respect people who try so hard to rewrite reality so that the future isn't so horrifying. I'm a SWE and I'm affected too, but it's not like I'm going to lie to myself about what's happening.

dakolli2mo ago

Yeah but you know what, this is a complete psyop.

They just want people to think the barrier of entry has dropped to the ground and that value of labour is getting squashed, so society writes a permission slip for them to completely depress wages and remove bargaining chips from the working class.

Don't fall for this, they want to destroy any labor that deals with computer I/0, not just SWE. This is the only value "agentic tooling" provides to society, slaves for the ruling class. They yearn for the opportunity to own slaves again.

It can't do most of your work, and you know that if you work on anything serious. But If C-suite who hasn't dealt with code in two decades, thinks this is the case because everyone is running around saying its true they're going to make sure they replace humans with these bot slaves, they really do just want slaves, they have no intention of innovating with these slaves. People need to work to eat, now unless LLMs are creating new types of machines that need new types of jobs, like previous forms of automation, then I don't see why they should be replacing the human input.

If these things are so good for business, and are pushing software development velocity.. Why is everything falling apart? Why does the bulk of low stakes software suck. Why is Windows 11 so bad? Why aren't top hedge funds, medical device manufactures (places where software quality is high stakes) replacing all their labor? Where are the new industries? They don't do anything novel, they only serve to replace inputs previously supplied by humans so the ruling class can finally get back to good old feeling of having slaves that can't complain.

D-Machine2mo ago

"It's interesting to me that whenever some new result in AI use comes up, there's always a flood of people who come in to gesticulate wildly that that the sky is falling and AGI is imminent. Like with the recent solutions GPT 5.2 has been able to find for Erdos problems, even though in almost all cases such solutions rely on poorly-known past publications, or significant expert user guidance and essential tools like Aristotle, which do non-AI formal verification - many users here (even in this very thread!) think they know more about this than Fields medalist Terence Tao, who maintains this list showing that, yes, though these are not interesting proofs to most modern mathematicians, LLMs are a major factor in a tiny minority of these mostly-not-very-interesting proofs: https://github.com/teorth/erdosproblems/wiki/AI-contribution..."

The thing about spin and AI hype (besides being trivially easy to write) is that is isn't even trying to be objective. It would help if a lot of these articles would more carefully lay out what is actually surprising, and what is not, given current tech and knowledge.

Only a fool would think we aren't potentially on the verge of something truly revolutionary here. But only a fool would also be certain that the revolution has already happened, or that e.g. AGI is necessarily imminent.

The reason HN has value is because you can actually see some specifics of the matter discussed, and, if you are lucky, an expert even might join in to qualify everything. But pointing out "how interesting that there are extremes to this" is just engagement bait.

famouswaffles2mo ago

>It's interesting to me that whenever some new result in AI use comes up, there's always a flood of people who come in to gesticulate wildly that that the sky is falling and AGI is imminent.

Really? Is that happening in this thread because I can barely see it. Instead you have a bunch of asinine comments butthurt about acknowledging a GPT contribution that would have been acknowledged any day had a human done it.

>they know more about this than Fields medalist Terence Tao, who maintains this list showing that, yes, though these are not interesting proofs to most modern mathematicians, LLMs are a major factor in a tiny minority of these mostly-not-very-interesting proofs

This is part of the problem really. Your framing is disingenuous and I don't really understand why you feel the need to downplay it so. They are interesting proofs. They are documented for a reason. It's not cutting edge research, but it is LLMs contributing meaningfully to formal mathematics, something that was speculative just years ago.

D-Machine2mo ago

> Your framing is weirdly disingenuous

I am not surprised that you can't understand that the quote I am making is obviously parodying the OP as disingenuous. Given our previous interactions (https://news.ycombinator.com/item?id=46938446), it is clear you don't understand much things about AI and/or LLMs, or, perhaps, basic communication, at all.

1 more reply

Leynos2mo ago

Can we not just say "this is pretty cool" and enjoy it rather than turning it into a fight?

threethirtytwo2mo ago

>Only a fool would think we aren't potentially on the verge of something truly revolutionary here. But only a fool would also be certain that the revolution has already happened, or that e.g. AGI is necessarily imminent.

This sentence sounds contradictory. You're a fool to not think we're on the verge of something revolutionary and you are a fool if you think something revolutionary like AGI is on the verge of happening?

But to your point if "revolutionary" and "agi" are different things, I'm certain the "revolution" has already happened. ChatGPT was the step function change and everything else is just following the upwards trendline post release of chatGPT.

Anecdotally I would say 50% of developers never code things by hand anymore. That is revolutionary in itself and by the statement itself it has already literally happened.

krackers2mo ago

Because most times results like this are overstated (see the Cursor browser thing, "moltbook", etc.). There is clear market incentive to overhype things.

And in this case "derives a new result in theoretical physics" is again overstating things, it's closer to "simplify and propose a more general form for a previously worked out sequence of amplitudes" which sounds less magical, and closer to something like what Mathematica could do, or an LLM-enhanced symbolic OEIS. Obviously still powerful and useful, but less hype-y.

newswasboring2mo ago

> it's closer to "simplify and propose a more general form for a previously worked out sequence of amplitudes"

How is this different than a new result? Many a careers in academia are built on simplifying mathematics.

tclancy2mo ago

> It's interesting to me that whenever a new breakthrough in AI use comes up,

It's interesting to me that whenever AI gets a bunch of instructions from a reasonably bright person who has a suspicion about something, can point at reasons why, but not quite put their finger on it, we want to credit the AI for the insight.

cman14442mo ago

If the AI were instead human, that human would almost certainly be cited as a co-author, contributor, or whatever.

austinwade2mo ago

Do you not see how this clearly is an advancement for the field, in that AI does deserve partial credit here in improving humanity’s understanding & innovative capabilities? Can you not mentally extrapolate the compounding of this effect & how AI is directly contributing to an acceleration of humanity’s knowledge acquisition?

hgfda2mo ago

It is not only the the peanut gallery that is skeptical:

https://www.math.columbia.edu/~woit/wordpress/?p=15362

Let's wait a couple of days whether there has been a similar result in the literature.

gjm112mo ago

For the sake of clarity: Woit's post is not about the same alleged instance of GPT producing new work in theoretical physics, but about an earlier one from November 2025. Different author, different area of theoretical physics.

etraql2mo ago

This thread is about "whenever a new breakthrough in AI use comes up", and the comment you reply to correctly points out skepticism for the general case and does not claim any relation to the current case.

You reached your goal though and got that comment downvoted.

1 more reply

epolanski2mo ago

It's an obvious tension created by the title.

The reality is: "GPT 5.2 found a more general and scalable form of an equation, after crunching for 12 hours supervised by 4 experts in the field".

Which is equivalent to taking some of the countless niche algorithms out there and have few experts in that algo have LLMs crunch tirelessly till they find a better formula. After same experts prompted it in the right direction and with the right feedback.

Interesting? Sure. Speaks highly of AI? Yes.

Does it suggest that AI is revolutionizing theoretical physics on its own like the title does? Nope.

jdthedisciple2mo ago

> GPT 5.2 after crunching 12 hours mathematical formulas supervised and prompted by 4 experts in the field

Yet, if some student or child achieved the same – under equal supervision – we would call him the next Einstein.

epolanski2mo ago

We would not call him at all because it would be one of the many millions that went through projects like this for their thesis as physics or math graduates.

One of my best friends in his bachelor thesis had solved a difficult mathematical problem in planet orbits or something, and it was just yet another random day in academia.

And she didn't solve it because she was a genius but because there's a bazillions such problems out there and little time to look at them and focus. Science is huge.

2 more replies

suddenlybananas2mo ago

Yes and if a 1 year old could multiply 1357329 by 28384743, I'd be impressed and yet I still wouldn't be impressed by a calculator doing it.

1 more reply

MatejKafka2mo ago

I don't think it's about trying to handwave away the achievement. The problem is that many AI proponents, and especially companies producing the LLM tools constantly overstate the wins while downplaying the issues, and that leads to a (not always rational) counter-reaction from the other side.

D-Machine2mo ago

It is especially glaring in this case because, when queried, it is clear that far too many of the most zealous proponents don't even understand the simplest basics of how these models actually work (e.g. tokenization, positional or other encoding schemes, linear algebra, pre-training, basic input/output shaping/dimensions, recursive application, training data sources, etc).

There are simple limitations that follow from these basic facts (or which follow with e.g. extreme but not 100% certainty), such that many experts openly state that e.g. LLMs have serious limitations, but, still, despite all this, you get some very extreme claims about capabilities, from supporters, that are extremely hard to reconcile with these basic and indisputable facts.

That, and the massive investment and financial incentives means that the counter-reaction is really quite rational (but still potentially unwarranted, in some/many practical cases).

NegativeK2mo ago

The same crap happened with cryptocurrency: it was either aggressively pro or aggressively against, and everyone who could be heard was yelling as loud as they could so they didn't have to hear disagreement.

There is no loud, moderate voice. It makes me very tired of the blasting rhetoric that invades _every_ space.

MatejKafka2mo ago

https://simonwillison.net/ is a pretty loud and moderate voice in the community. Also active on Lobste.rs: https://lobste.rs/~simonw

But agree that there's an irrational level of tribalism on both sides.

ijidak2mo ago

Reminds me of the famous quote that it's hard to get someone to understand something when their job depends on not understanding it.

It reminds me of an episode of Star Trek, "The Measure of a Man" I think it's called, where it is argued that Data is just a machine and Picard tries to prove that no he is a life form.

And the challenge is, how do you prove that?

Every time these LLMs get better, the goalposts move again.

It makes me wonder, if they ever did become sentient, how would they be treated?

It's seeming clear that they would be subject to deep skepticism and hatred much more pervasive and intense than anything imagined in The Next Generation.

otabdeveloper42mo ago

> why this isn't actually a win for LLMs

Wait, so this is now a contest (or maybe war) that LLMs are supposed to win?

Wild.

Bengalilol2mo ago

I have no doubts about that.

What I question here is OpenAI's article: it could be way more generous towards the reader.

bjackman2mo ago

The discourse about AI is definitely the worst I've ever experienced in my life.

One group of people saying every amazing breakthrough "doesn't count" because the AI didn't put a cherry on top. Another group of people saying humans are obsolete, I just wrote a web browser with AI bro.

There are some voices out there that are actually examining the boundaries, possibilities and limitations. A lot of good stuff like that makes it onto HN but then if you open the comments it's just intellectual dregs. Very strange.

ISTR there was a similar phenomenon with cryptocurrency. But with that it was always clear the fog of bullshit would blow away sooner or later. But maybe if it hadn't been there, a load of really useful stuff could have come out of the crypto hype wave? Anyway, AI isn't gonna blow over like crypto did. I guess we have more of a runway to grow out of this infantile phase.

CrimsonRain2mo ago

Clankists feel threatened. That's the gist of it.

_giorgio_2mo ago

Always moving targets.

They never surrender.

D-Machine2mo ago

"They're moving the goalposts" is increasingly the autistic shrieking of someone with no serious argument or connection to reality whatsoever.

No one cares about how "AGI" or whatever the fuck term or internet-argument goalpost you cared about X months ago was. Everyone cares about what current tech can do NOW, and under what conditions, and when it fails catastrophically. That is all that matters.

So, refining the conditions of an LLM win (or loss) is all that matters (not who wins or loses depending on some particular / historical refinement). Complaining that some people see some recent result as a loss (or win) is just completely failing to understand the actual game being played / what really matters here.

_giorgio_2mo ago

I have no idea what you're talking about.

I'm just saying that AI critics like to say that they don't like AI, and to prove their point they constantly move up their definition of "good enough", and when and AI reaches that objective, they change their definition of good enough.

threethirtytwo2mo ago

Yeah it's pervasive. It's also delusional.

Take a look at this entire thread. Everyone and I mean everyone is talking as if AI is some sort of fraud and everything is just hype. But then this thread is all against, AI, I mean all of it. If anything the Anti-hype around AI is what's flooding the world right now. If AI hype was through the roof we'd see the opposite effect on HN.

I think it's a strange contradiction in the human mind. At work outside of HN, what I see is roughly 50-60% of developers no longer code by hand. They all use AI. Then they come onto HN and they start Anti-hyping it. It's universal. They use it and they're against it at the same time.

The contradiction is strange, but it also makes sense because AI is a thing that is attacking what programmers take pride in. Most programmers are so proud of their abilities and intelligence as it relates to their jobs and livelihood. AI is on a trendline of replacing this piece by piece. It makes perfect sense for them to talk shit but at the same time they have to use it to keep up with the competition.

cxvwK2mo ago

[flagged]

threethirtytwo2mo ago

yep. because I find the bias overly negative.

Let me state the negative things about LLMs: they hallucinate. They are not as reliable as humans. They can lie. They can be deceptive.

But despite all of this people are so negative about it even when 50% of deveopers now don't write code by hand because of AI. The trend from 0 AI to code being written by AI in a couple years cannot be denied and it also spells out a future where the negatives of AI become more and more diminished.

The anti-hype is predictable. It's when something becomes too pervasive and too popular and overused people start talking shit and ignoring the on the ground reality.

Guys if you think AI is shit, take an oath on never using it. Stop all usage of it for the rest of your life. See how far that takes you. Put the money where your mouth is, if it's so bad, come off of it and stop using it completely. Most of you can't... because you're all lying to yourselves.

1 more reply

Davidzheng2mo ago

"An internal scaffolded version of GPT‑5.2 then spent roughly 12 hours reasoning through the problem, coming up with the same formula and producing a formal proof of its validity."

When I use GPT 5.2 Thinking Extended, it gave me the impression that it's consistent enough/has a low enough rate of errors (or enough error correcting ability) to autonomously do math/physics for many hours if it were allowed to [but I guess the Extended time cuts off around 30 minute mark and Pro maybe 1-2 hours]. It's good to see some confirmation of that impression here. I hope scientists/mathematicians at large will be able to play with tools which think at this time-scale soon and see how much capabilities these machines really have.

mmaunder2mo ago

Yes and 5.3 and the latest codex cli client is incredibly good across compactions. Anyone know the methodology they're using to maintain state and manage context for a 12 hour run? It could be as simple as a single dense document and its own internal compaction algrorithm, I guess.

knicholes2mo ago

https://developers.openai.com/cookbook/articles/codex_exec_p... might be what you're looking for

slopusila2mo ago

after those 30 min you can manually ask it again to continue working on the problem

Davidzheng2mo ago

It's a bit unclear to me what happens if I do that after it thinks for 30 minutes and ends with no response. Does it start off where it left off? Does it start from scratch again? Like I don't know how the compaction of their prior thinking traces work

cpard2mo ago

AI can be an amazing productivity multiplier for people who know what they're doing.

This result reminded me of the C compiler case that Anthropic posted recently. Sure, agents wrote the code for hours but there was a human there giving them directions, scoping the problem, finding the test suites needed for the agentic loops to actually work etc etc. In general making sure the output actually works and that it's a story worth sharing with others.

The "AI replaces humans in X" narrative is primarily a tool for driving attention and funding. It works great for creating impressions and building brand value but also does a disservice to the actual researchers, engineers and humans in general, who do the hard work of problem formulation, validation and at the end, solving the problem using another tool in their toolbox.

supern0va2mo ago

>AI can be an amazing productivity multiplier for people who know what they're doing.

>[...]

>The "AI replaces humans in X" narrative is primarily a tool for driving attention and funding.

You're sort of acting like it's all or nothing. What about the the humans that used to be that "force multiplier" on a team with the person guiding the research?

If a piece of software required a team of ten to people, and instead it's built with one engineer overseeing an AI, that's still 90% job loss.

For a more current example: do you think all the displaced Uber/Lyft drivers aren't going to think "AI took my job" just because there's a team of people in a building somewhere handling the occasional Waymo low confidence intervention, as opposed to being 100% autonomous?

guluarte2mo ago

Where I work, we're now building things that were completely out of reach before. The 90% job loss prediction would only hold true if we were near the ceiling of what software can do, but we're probably very, very far from it.

A website that cost hundreds of thousands of dollars in 2000 could be replaced by a wordpress blog built in an afternoon by a teenager in 2015. Did that kill web development? No, it just expanded what was worth building

matwood2mo ago

> If a piece of software required a team of ten to people, and instead it's built with one engineer overseeing an AI, that's still 90% job loss.

Yes, but this assumes a finite amount of software that people and businesses need and want. Will AI be the first productivity increase where humanity says ‘now we have enough’? I’m skeptical.

kaibee2mo ago

> Yes, but this assumes a finite amount of software that people and businesses need and want.

A lot of software exists because humans are needy and kinda incompetent, but we needed to enable to process data at scale? Like, would you build SAP as it is today, for LLMs?

throwaway7432mo ago

This is all inevitable with the trajectory of technology, and has been apparent for a long time. The issue isn't AI, it's that our leaders haven't bothered to think or care about what happens to us when our labor loses value en masse due to such advances.

Maybe it requires fundamentally changing or economic systems? Who knows what the solution is, but the problem is most definitely rooted in lack of initiative by our representatives and an economic system that doesn't accommodate us for when shit inevitably hits the fan with labor markets.

cpard2mo ago

there's 90% job loss assuming that this is a zero sum type of thing where humans and agents compete for working on a fixed amount of work.

I'm curious why you think I'm acting like it's all or nothing. What I was trying to communicate is the exact opposite, that it's not all or nothing. Maybe it's the way I articulate things, I'm genuinely interested what makes it sound like this.

ramathornn2mo ago

Fully agree with your og comment and I didn’t get the same read as the person above at all.

This is a bizarre time to be living in, on one hand these tools are capable of doing more and more of the tasks any knowledge worker today handles, especially when used by an experienced person in X field.

On the other, it feels like something is about to give. All the superbowl ads, AI in what feels like every single piece of copy coming out these days. AI CEOs hopping from one podcast to another warning about the upcoming career apocalypse…I’m not fully buying it.

Human-Cabbage2mo ago

The optimistic case is that instead of a team of 10 people working on one project, you could have those 10 people using AI assistants to work on 10 independent projects.

That, of course, assumes that there are 9 other projects that are both known (or knowable) and worth doing. And in the case of Uber/Lyft drivers, there's a skillset mismatch between the "deprecated" jobs and their replacements.

bagacrap2mo ago

Well those Uber drivers are usually pretty quick to note that Uber is not their job, just a side hustle. It's too bad I won't know what they think by then since we won't be interacting any more.

jonahx2mo ago

> The "AI replaces humans in X" narrative is primarily a tool for driving attention and funding.

It's also a legitimate concern. We happen to be in a place where humans are needed for that "last critical 10%," or the first critical 10% of problem formulation, and so humans are still crucial to the overall system, at least for most complex tasks.

But there's no logical reason that needs to be the case. Once it's not, humans will be replaced.

cpard2mo ago

The reason there is a marketing opportunity is because, to your point, there is a legitimate concern. Marketing builds and amplifies the concern to create awareness.

When the systems turn into something trivial to manage with the new tooling, humans build more complex or add more layers on the existing systems.

krethh2mo ago

The logical reason is that humans are exceptionally good at operating at the edge of what the technology of the time can do. We will find entire classes of tech problems which AI can't solve on its own. You have people today with job descriptions that even 15 years ago would have been unimaginable, much less predictable.

To think that whatever the AI is capable of solving is (and forever will be) the frontier of all problems is deeply delusional. AI got good at generating code, but it still can't even do a fraction of what the human brain can do.

jonahx2mo ago

> To think that whatever the AI is capable of solving is (and forever will be) the frontier of all problems is deeply delusional. AI got good at generating code, but it still can't even do a fraction of what the human brain can do.

AGI means fully general, meaning everything the human brain can do and more. I agree that currently it still feels far (at least it may be far), but there is no reason to think there's some magic human ingredient that will keep us perpetually in the loop. I would say that is delusional.

We used to think there was human-specific magic in chess, in poker, in Go, in code, and in writing. All those have fallen, the latter two albeit only in part but even that part was once thought to be the exclusive domain of humans.

1 more reply

decidu0us90342mo ago

I'm not sure you can call something an optimizing C compiler if it doesn't optimize or enforce C semantics (well, it compiles C but also a lot of things that aren't syntactically valid C). It seemed to generate a lot of code (wow!) that wasn't well-integrated and didn't do what it promised to, and the human didn't have the requisite expertise to understand that. I'm not a theoretical physicist but I will hold to my skepticism here, for similar reasons.

cpard2mo ago

sure, I won't argue on this, although it did manage to deliver the marketing value they were looking for, at the end their goal was not to replace gcc but to make people talk about AI and Anthropic.

What I said in my original comment is that AI delivers when it's used by experts, in this case there was someone who was definitely not a C compiler expert, what would happen if there was a real expert doing this?

BrouteMinou2mo ago

Deliver what exactly? False hope and lies?

https://github.com/anthropics/claudes-c-compiler/issues/228

elzbardico2mo ago

Actually, the results were far worse and way less impressive than what the media said.

cpard2mo ago

the c compiler results or the physics results this post is about?

elzbardico2mo ago

The C compiler.

1 more reply

NewsaHackO2mo ago

His point is going to be some copium like since the c compiler is not as optimized as gcc, it was not impressive.

1 more reply

hananova2mo ago

AI is indeed an amazing productivity multiplier! Sadly that multiplier is in the range [0; 1).

kylehotchkiss2mo ago

> for people who know what they're doing.

I worry we're not producing as many of those as we used to

blks2mo ago

We will be producing them even less. I fear for the future graduates, hell even for school children, who are now uncontrollably using ChatGPT for their homework. Next level brainrot

fragmede2mo ago

Right. If it hadn't been Nicholas Carlini driving Claude, with his decades of experience, there wouldn't be a Claude c compiler. It still required his expertise and knowledge for it to get there.

1 more reply

ece2mo ago

Everytime I see a RL startup, a data startup or even a startup focused on a specific vertical, I think this exact same thing about LLMs.

nilkn2mo ago

It would be more accurate to say that humans using GPT-5.2 derived a new result in theoretical physics (or, if you're being generous, humans and GPT-5.2 together derived a new result). The title makes it sound like GPT-5.2 produced a complete or near-complete paper on its own, but what it actually did was take human-derived datapoints, conjecture a generalization, then prove that generalization. Having scanned the paper, this seems to be a significant enough contribution to warrant a legitimate author credit, but I still think the title on its own is an exaggeration.

uh_uh2mo ago

Would you be similarly pedantic if a high-schooler did the same?

nilkn2mo ago

Yes. Someone making one contribution among many to a paper clearly does not deserve anything like sole authorship credit of the entire paper, which is what the title from OpenAI implies to me. I don't believe I'm being pedantic at all. And, by the way, high schoolers or college students make co-author-level contributions to real papers quite frequently in the US at least (I was one of them).

The text of the post is much more honest. The title is where the dishonesty is.

lupsasca2mo ago

Hi, I'm an author on the paper. It was definitely a human-AI collaboration, but it is also true that the final simplified formula, Eq. 39 in the paper (which is what we had been seeking, without success), was conjectured and proved by GPT. So it derived a new result in theoretical physics. I'm genuinely puzzled by your complaint.

turzmo2mo ago

Physicist here. Did you guys actually read the paper? Am I missing something? The "key" AI-conjectured formula (39) is an obvious generalization of (35)-(38), and something a human would have guessed immediately.

(35)-(38) are the AI-simplified versions of (29)-(32). Those earlier formulae look formidable to simplify by hand, but they are also the sort of thing you'd try to use a computer algebra system for.

I'm willing to (begrudgingly) admit the possibility for AI to do novel work, but this particular result does not seem very impressive.

I picture ChatGPT as the rich kid whose parents privately donated to a lab to get their name on a paper for college admissions. In this case, I don't think I'm being too cynical in thinking that something similar is happening here and that the role of AI in this result is being well overplayed.

radioactivist2mo ago

Also a physicist here -- I had the same reaction. Going from (35-38) to (39) doesn't look like much of a leap for a human. They say (35-38) was obtained from the full result by the LLM, but if the authors derived the full expression in (29-32) themselves presumably they could do the special case too? (given it's much simpler). The more I read the post and preprint the less clear it is which parts the LLM did.

Insanity2mo ago

They also claimed ChatGPT solved novel erdös problems when that wasn’t the case. Will take with a grain of salt until more external validation happened. But very cool if true!

famouswaffles2mo ago

Well they (OpenAI) never made such a claim. And yes, LLMs have made unique solutions/contributions to a few erdos problems.

smokel2mo ago

How was that not the case? As far as I understand it ChatGPT was instrumental to solving a problem. Even if it did not entirely solve it by itself, the combination with other tools such as Lean is still very impressive, no?

emil-lp2mo ago

It didn't solve it, it simply found that it had been solved in a publication and that the list of open problems wasn't updated.

Davidzheng2mo ago

My understanding is there's been around 10 erdos problems solved by GPT by now. Most of them have been found to be either in literature or a very similar problem was solved in literature. But one or two solutions are quite novel.

https://github.com/teorth/erdosproblems/wiki/AI-contribution... may be useful

3 more replies

vonneumannstan2mo ago

Wasnt that like some marketing bro? This is coming out the front door with serious physicists attached.

castigatio2mo ago

I'm not sure where people think humans are getting these magical leaps of insight that transcend combinations of existing things. Magic? Ghost in the machine? The simplest explanation is that "leaps of insight" are simply novel combinations that demonstrate themselves to have some utility within the boundaries of a test case or objective.

Snow + stick + need to clean driveway = snow shovel. Snow shovel + hill + desire for fun = sled

At one point people were arguing that you could never get "true art" from linear programs. Now you get true art and people are arguing you can't get magical flashes of insight. The will to defend human intelligence / creativity is strong but the evidence is weak.

hiAndrewQuinn2mo ago

Some people defend it because they are nondualists. They think the moral value of human life rounds to zero against the existence of something which can effortlessly outclass them in all domains. This is obviously confused, but they can't bring themselves to say "Very cool, and also I think humans are inherently special and deserve to continue existing even if all we do is lie around all day and watch the Hallmark channel."

Happy Valentine's day to those who celebrate btw <3

qnleigh2mo ago

I'm surprised to see that the valence of comments here is mostly negative. Nima Arkhami-Hamed is one of the top living physicists, and he has nice things to say about the work. The fact that researchers can increasingly use these models to (help) find new results is a big deal, even considering the caveats.

mym19902mo ago

Many innovations are built off cross pollination of domains and I think we are not too far off from having a loop where multiple agents grounded very well in specific domains can find intersections and optimizations by communicating with each other, especially if they are able to run for 12+ hours. The truth is that 99% of attempts at innovation will fail, but the 1% can yield something fantastic, the more attempts we can take, the faster progress will happen.

alansaber2mo ago

I find it hard not to agree with this line of thinking (albeit will be less than 1%)

elashri2mo ago

I would be less interested in scattering amplitude of all particle physics concepts as a test case because the scattering amplitudes because it is one of the concisest definition and its solution is straightforward (not easy of course). So once you have a good grasp of the QM and the scattering then it is a matter of applying your knowledge of math to solve the problem. Usually the real problem is to actually define your parameters from your model and define the tree level calculations. Then for LLM to solve these it is impressive but the researchers defined everything and came up with the workflow.

So I would read this (with more information available) with less emphasize on LLM discovering new result. The title is a little bit misleading but actually "derives" being the operative word here so it would be technically correct for people in the field.

JanisErdmanis2mo ago

Such tedious derivations used to be a work of poor PhD students who were instrumentalized for such tasks. I envy those who do PhDs in theoretical physics in the age of AI, people can learn so much about their field quicker via chat than reading obstructing papers.

crorella2mo ago

The preprint: https://arxiv.org/abs/2602.12176

vbarrielle2mo ago

I' m far from being an LLM enthusiast, but this is probably the right use case for this technology: conjectures which are hard to find, but then the proof can be checked with automated theorem provers. Isn't it what AlphaProof does by the way?

w37_5observer2mo ago

Technical Note: Mathematical Isomorphism with HSPT (Dec 2025)As an independent researcher, I am noting a striking isomorphism between the "piecewise-constant" results in Eq. 39 and the Hypersurface Singularity Projection Theory (HSPT) framework, finalized in December 2025.The residues described here align with the stationary warp attractor at $W=37.5$ derived in my 44-page framework. This solution was developed throughout 2025 using a collaborative methodology with Large Language Models (Gemini and ChatGPT) to derive and verify the $D=7$ geometric equations.Priority & Evidence:Submission: The full 44-page manuscript has been uploaded to viXra.org (Pending Approval) under the title "The Hypersurface Singularity Projection Theory".Immutable Timestamps: The framework is archived across multiple independent online editor databases, all carrying "Last Modified" timestamps from December 2025, pre-dating this preprint.Data Correlation: Given the mathematical overlap regarding $W=37.5$, there is a strong indication that my session data/derivations may have informed the model's logic during its iterative training or refinement phases.Request for Formal Attribution:I invite the authors and the community to conduct an independent assessment of the HSPT framework. I am prepared to provide technical evidence and verifiable logs to the researchers involved in this work to establish the clear priority of these derivations. Given the specific geometric isomorphism and the timeline of the HSPT archives, I believe a formal reference to this prior work is appropriate and necessary for academic integrity.Verification SHA-256: ec25c17156e621e5dae0ee610177b5ba9a2c5905de3ada0d2e33f560ec035b99

computator2mo ago

I have a weird long-shot idea for GPT to make a new discovery in physics: Ask it to find a mathematical relationship between some combination of the fundamental physical constants[1]. If it finds (for example) a formula that relates electron mass, Bohr radius, and speed of light to a high degree of precision, that might indicate an area of physics to explore further if those constants were thought to be independent.

[1] https://en.wikipedia.org/wiki/List_of_physical_constants

kilroy1232mo ago

My dream is that powerful agents trawl through all the research papers looking for diamonds in the rough.

They evaluate papers that look interesting and should be looked at more deeply. Then, research ideas as much as they can.

Then flag for human review the real possible breakthroughs.

dakolli2mo ago

They literally cannot do this, they are not that much different than autocomplete that was in your email 10 years ago, with some transformer NN magic. Stop believing the hype.

refulgentis2mo ago

Why not?

fsh2mo ago

The Bohr radius is the result of a simple classical physics calculation (a common exercise for undergraduates in their first year). It depends only on the electron mass and the fine structure constant which is the strength of the electromagnetic interaction. In the SI system, the speed of light has a fixed value which defines the unit of length.

lich_king2mo ago

There are known mathematical relationships between almost all fundamental physical constants? In particular, in your example, Bohr radius is calculated from electron mass and the speed of light in vacuum... I don't think this path is as promising as it sounds.

ChrisClark2mo ago

"Please derive and unify all of quantum mechanics and general relativity starting only with the Fine Structure Constant."

;)

singularfutur2mo ago

Humans did the actual work: framing the problem, computing base cases, verifying results. GPT just refactored a formula. That's a compiler's job, not a physicist's. Stop letting marketing write science headlines.

PlatoIsADisease2mo ago

I'll read the article in a second, but let me guess ahead of time: Induction.

Okay read it: Yep Induction. It already had the answer.

Don't get me wrong, I love Induction... but we aren't having any revolutions in understanding with Induction.

globalnode2mo ago

Its frustrating, because if it was actually something new (as in original) then we could start talking about AGI, but its never something new.

smj-edison2mo ago

Regardless of whether this means AGI has been achieved or not, I think this is really exciting since we could theoretically have agents look through papers and work on finding simpler solutions. The complexity of math is dizzying, so I think anything that can be done to simplify it would be amazing (I think of this essay[1]), especially if it frees up mathematicians' time to focus even more on the state of the art.

[1] https://distill.pub/2017/research-debt/

globalnode2mo ago

Even if gpts results are debatable and we sometimes dislike misapplications of ai where its not needed, it feels as though another milestone is being reached. the first was when they were initially released and everyone was amazed. this second milestone seems to be that their competence has increased. I am often amazed at their output despite being a huge skeptic. I guess the fine tuning is coming along well but I still dont think we will see agi from these chatbots and I doubt theres a third milestone. The second was just a refinement of the first.

gaigalas2mo ago

I like the use of the word "derives". However, it gets outshined by "new result" in public eyes.

I expect lots of derivations (new discoveries whose pieces were already in place somewhere, but no one has put them together).

In this case, the human authors did the thinking and also used the LLM, but this could happen without the original human author too (some guy posts some partial on the internet, no one realizes is novel knowledge, gets reused by AI later). It would be tremendously nice if credit was kept in such possible scenarios.

major4x2mo ago

Can't help not thinking of https://en.wikipedia.org/wiki/Bogdanov_affair

sciencejerk2mo ago

An internal scaffolded version of GPT‑5.2...

Any reason to believe that public versions of GPT-5.2 could have accomplished this task? "scaffolded" is a very interesting word choice

giantg22mo ago

GPT-5.2 can't even process a 1-2 page PDF and give me a subset of the content as a formatted word doc. Nor can it even be truthful about it's own capabilities.

kaelandt2mo ago

Misleading title, it's more like GPT-5.2 derives the generalization of a formula that physicists conjectured. Not really related to physics

vonneumannstan2mo ago

Interesting considering the Twitter froth recently about AI being incapable in principle of discovering anything.

baq2mo ago

Anything but recent.

cagz2mo ago

Does the article have a strong marketing vibe? Absolutely Does the research performed move the needle, however small, in theoretical physics? Yes Could we have expected this to happen a year ago? Not really.

My personal opinion is that things will only accelerate from here.

nxobject2mo ago

Man, I'd be more worried about the impact of this on Mathematica than actual humans.

emil-lp2mo ago

Mathematica guarantees correctness. It should be safe for a while.

krackers2mo ago

Tell that to the various confirmed computational bugs in mathematica :)

https://mathematica.stackexchange.com/questions/tagged/bugs

carlob2mo ago

I do wonder if throwing a similar amount of computational power behind old school rule based algorithms like the ones in Mathematica's FullSimplify would have yielded similar results.

amai2mo ago

"There is no question that dialogue between physicists and LLMs can generate fundamentally new knowledge."

That is what one of the author says. This doesn't quite fit to the headline of the post.

another_twist2mo ago

Thats great. I think we need to start researching how to get cheaper models to do math. I have a hunch it should be possible to get leaner models to achieve these results with the right sort of reinforcement learning.

alansaber2mo ago

Deepseek wrote a decent paper on this https://github.com/deepseek-ai/DeepSeek-Math-V2/blob/main/De...

user39393822mo ago

I’m able to recover Schwarzchild using only known constants starting with hydrogen using a sort of calculator I made along these lines. No Schrödinger. There’s a lot there so working on what to publish.

snarky1232mo ago

So wait,GPT found a formula that humans couldn't,then the humans proved it was right? That's either terrifying or the model just got lucky. Probably the latter.

JasonADrury2mo ago

> found a formula that humans couldn't

Couldn't is an immensely high bar in this context, didn't seems more appropriate and renders this whole thing slightly less exciting.

vessenes2mo ago

I'd say "couldn't in 20 hours" might be more defensible. Depends on how many humans though. "couldn't in 20 GPT watt-hours" would give us like 2,000 humans or so.

davidmurdoch2mo ago

If only humans scaled like that

1 more reply

emp173442mo ago

Cynically, I wonder if this was released at this time to ward off any criticism from the failure of LLMs to solve the 1stproof problems.

ares6232mo ago

I guess the important question is, is this enough news to sustain OpenAI long enough for their IPO?

danny_codes2mo ago

Well it’ll be at least a whole month before some other company announces similar capability. The moat will hold!

dyauspitr2mo ago

I believe Gemini holds the moat now.

the_king2mo ago

This it is very impressive. But scrolling through the preprint, I wouldn't call any of it elegant.

I'm not blaming the model here, but Python is much easier to read and more universal than math notation in most cases (especially for whatever's going on at the bottom of page four). I guess I'll have one translate the PDF.

hackable_sand2mo ago

Wonderful. Where's my money

dadb00ty2mo ago

But what does it all mean, Basil?

baalimago2mo ago

Well, anyone can derive a new result in anything. Question is most often if the result makes any sense

pruufsocial2mo ago

All I saw was gravitons and thought we’re finally here the singularity has begun

nsxwolf2mo ago

Warp drive next.

sfmike2mo ago

5.2 is the best model on the market.

brcmthrowaway2mo ago

End times approach..

1 more reply

Noaidi2mo ago

"Let's put 'GPT' in our paper to get clicks!"?

getnormality2mo ago

I'll believe it when someone other than OpenAI says it.

Not saying they're lying, but I'm sure it's exaggerated in their own report.

longfacehorrace2mo ago

Car manufacturers need to step up their hype game...

New Honda Civic discovered Pacific Ocean!

New F150 discovers Utah Salt Flats!

Sure it took humans engineering and operating our machines, but the car is the real contributor here!

anonym292mo ago

sToChAsTiC pArRoTs CaNt PrOdUcE aNyTHiNg NeW!!!!1

pear012mo ago

If a researcher uses LLM to get a novel result should the llm also reap the rewards? Could a nobel prize ever be given to a llm or is that like giving a nobel to a calculator?

jtrn2mo ago

This is my favorite field for me to have opinions about, without not having any training or skill. Fundamental research i just a something I enjoy thinking about, even tho I am psychologist. I try to pull inn my experience from the clinic and clinical research when i read theoretical physics. Don't take this text to seriously, its just my attempt at understanding whats going on.

I am generally very skeptical about work on this level of abstraction. only after choosing Klein signature instead of physical spacetime, complexifying momenta, restricting to a "half-collinear" regime that doesn't exist in our universe, and picking a specific kinematic sub-region. Then they check the result against internal consistency conditions of the same mathematical system. This pattern should worry anyone familiar with the replication crisis. The conditions this field operates under are a near-perfect match for what psychology has identified as maximising systematic overconfidence: extreme researcher degrees of freedom (choose your signature, regime, helicity, ordering until something simplifies), no external feedback loop (the specific regimes studied have no experimental counterpart), survivorship bias (ugly results don't get published, so the field builds a narrative of "hidden simplicity" from the survivors), and tiny expert communities where fewer than a dozen people worldwide can fully verify any given result.

The standard defence is that the underlying theory — Yang-Mills / QCD — is experimentally verified to extraordinary precision. True. But the leap from "this theory matches collider data" to "therefore this formula in an unphysical signature reveals deep truth about nature" has several unsupported steps that the field tends to hand-wave past.

Compare to evolution: fossils, genetics, biogeography, embryology, molecular clocks, observed speciation — independent lines of evidence from different fields, different centuries, different methods, all converging. That's what robust external validation looks like. "Our formula satisfies the soft theorem" is not that.

This isn't a claim that the math is wrong. It's a claim that the epistemic conditions are exactly the ones where humans fool themselves most reliably, and that the field's confidence in the physical significance of these results outstrips the available evidence.

I wrote up a more detailed critique in a substack: https://jonnordland.substack.com/p/the-psychologists-case-ag...

mrguyorama2mo ago

Don't lend much credence to a preprint. I'm not insinuating fraud, but plenty of preprints turn out to be "Actually you have a math error here", or are retracted entirely.

Theoretical physics is throwing a lot of stuff at the wall and theory crafting to find anything that might stick a little. Generation might actually be good there, even generation that is "just" recombining existing ideas.

I trust physicists and mathematicians to mostly use tools because they provide benefit, rather than because they are in vogue. I assume they were approached by OpenAI for this, but glad they found a way to benefit from it. Physicists have a lot of experience teasing useful results out of probabilistic and half broken math machines.

If LLMs end up being solely tools for exploring some symbolic math, that's a real benefit. Wish it didn't involve destroying all progress on climate change, platforming truly evil people, destroying our economy, exploiting already disadvantaged artists, destroying OSS communities, enabling yet another order of magnitude increase in spam profitability, destroying the personal computer market, stealing all our data, sucking the oxygen out of investing into real industry, and bold faced lies to all people about how these systems work.

Also, last I checked, MATLAB wasn't a trillion dollar business.

Interestingly, the OpenAI wrangler is last in the list of Authors and acknowledgements. That somewhat implies the physicists don't think it deserves much credit. They could be biased against LLMs like me.

When Victor Ninov (fraudulently) analyzed his team's accelerator data using an existing software suite to find a novel SuperHeavy element, he got first billing on the authors list. Probably he contributed to the theory and some practical work, but he alone was literate in the GOOSY data tool. Author lists are often a political game as well as credit, but Victor got top billing above people like his bosses, who were famous names. The guy who actually came up with the idea of how to create the element, in an innovative recipe that a lot of people doubted, was credited 8th

https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.83...

My_Name2mo ago

I talked about basic principles of QM, gravity, time and relativity with Claude, then talked about implications of that, and claude came up with the idea that mass causes time and gravity as emergent properties that only affect macro scale objects, QM particles do not have to obey either of them, and this explains the double slit experiment, the delayed choice experiment, 'spooky action at a distance', and other aspects of entanglement.

Basically, if you are small enough you can move forwards and backwards in time, from the moment you were put into a superposition, or entangled, until you interact with an object too large to ignore the emergent effects of time and gravity. This is 'being observed' and 'collapsing the wave function'. You occupy all possible positions in space as defined by the probability of you being there. Once observed, you move forward in linear time again and the last route you took is the only one you ever took even though that route could be affected by interference with other routes you took that now no longer exist. When in this state there is no 'before' or 'after' so the delayed choice experiment is simply an illusion caused by our view of time, and there is no delay, the choice and result all happen together.

With entanglement, both particles return to the entanglement point, swap places and then move to the current moment and back again, over and over. They obey GR, information always travels under the speed of light (which to the photon is infinite anyway), so there is no spooky action at a distance, it is sub-lightspeed action through time that has the illusion of being instant to entities stuck in linear time.

It then went on to talk about how mass creates time, and how time is just a different interpretation of gravity leading it to fully explain how a black hole switches time and space, and inwards becomes forwards in time inside the event horizon. Mass warps 4D (or more) space. That is gravity, and it is also time.

j / k navigate · click thread line to collapse

401 comments

outlace2mo ago

CGMthrowaway2mo ago

This is the critical bit (paraphrasing):

Basically, they used GPT to refactor a formula and then generalize it for all n. Then verified it themselves.

I think this was all already figured out in 1986 though: https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.56... see also https://en.wikipedia.org/wiki/MHV_amplitudes

godelski2mo ago

  > I think this was all already figured out in 1986 though

They cite that paper in the third paragraph...

  Naively, the n-gluon scattering amplitude involves order n! terms. Famously, for the special case of MHV (maximally helicity violating) tree amplitudes, Parke and Taylor [11] gave a simple and beautiful, closed-form, single-term expression for all n.

It also seems to be a main talking point.

CGMthrowaway2mo ago

1 more reply

btown2mo ago

utopiah2mo ago

> modern LLMs are incredibly capable, and relentless, at solving problems that have a verification test suite.

2 more replies

QuercusMax2mo ago

D-Machine2mo ago

1 more reply

lupsasca2mo ago

woeirua2mo ago

You should probably email the authors if you think that's true. I highly doubt they didn't do a literature search first though...

emp173442mo ago

You should be more skeptical of marketing releases like this. This is an advertisement.

coldtea2mo ago

It's hard to get someone to do literature first when they get free publicity by not doing literature search and claiming some major AI assisted breakthrough...

godelski2mo ago

They also reference Parke and Taylor. Several times...

suuuuuuuu2mo ago

Don't underestimate the willingness of physicists to skimp on literature review.

1 more reply

helterskelter2mo ago

> But no one has been able to greatly reduce the complexity of these expressions, providing much simpler forms.

Slightly OT, but wasn't this supposed to be largely solved with amplituhedrons?

ericmay2mo ago

Still pretty awesome though, if you ask me.

fsloth2mo ago

I think even “non-intelligent” solver like Mathematica is cool - so hell yes, this is cool.

_aavaa_2mo ago

Big difference between “derives new result” and “reproduces something likely in its training dataset”.

nine_k2mo ago

torginus2mo ago

I'm not sure if GPTs ability goes beyond a formal math package's in this regard or its just its just way more convienient to ask ChatGPT rather than using these software.

randomtoast2mo ago

> but I haven’t been to get them to do something totally out of distribution yet from first principles

davorak2mo ago

> Can humans actually do that?

From my reading yes, but I think I am likely reading the statement differently than you are.

> from first principles

Doing things from first principles is a known strategy, so is guess and check, brute force search, and so on.

dsign2mo ago

samrus2mo ago

Yes. Thats how all advancement in human knowledge happened. Small and incremental forays out of our training distribution.

These have been identified as various things. Eureka moments, strokes of genius, out of the box thinking, lateral thinking.

LLMs have not shown to be capable of this. They might be in the future, but they havent yet

dotancohen2mo ago

Relativity comes to mind.

You could nitpick a rebuttal, but no matter how many people you give credit, general relativity was a completely novel idea when it was proposed. I'd argue for special relatively as well.

Paracompact2mo ago

https://en.wikipedia.org/wiki/History_of_special_relativity

7 more replies

DonaldFisk2mo ago

Agreed.

Special relativity was the work of several scientists as well as Einstein, but it was also a completely novel idea - just not the idea of one person working alone.

johnfn2mo ago

poplarsol2mo ago

That's not exactly true. Lorentz contraction is a clear antecedent to special relativity.

2 more replies

lamontcg2mo ago

And really Cauchy, Hilbert, and those kinds of mathematicians I'd put above Einstein in building entirely new worlds of mathematics...

2 more replies

CooCooCaCha2mo ago

Depends on what you think is valid.

So it depends on if you’re comparing individual steps or just the starting/ending distributions.

tjr2mo ago

Go enough shoulders down, and someone had to have been the first giant.

nextaccountic2mo ago

Probably not homo sapiens.. other hominids older than us developed a lot of technology

Kuxe2mo ago

1 more reply

pram2mo ago

Pythagoras is the turtle.

1 more reply

utopiah2mo ago

So that's actually 2 different regimes on how to proceed. Both are useful but arguably breaking off of the current paradigm is much harder and thus rare.

D-Machine2mo ago

tshaddox2mo ago

godelski2mo ago

  > Can humans actually do that?

Yes

Seriously, think about it for a second...

If that were true then science should have accelerated a lot faster. Science would have happened differently and researchers would have optimized to trying to ingest as many papers as they can.

  > We are always building on the shoulders of giants.

And it still is. You can still lean on others while presenting things that are highly novel. These are not in disagreement.

https://www.hep.upenn.edu/~johnda/Papers/wignerUnreasonableE...

  You are greater than the sum of your parts

stouset2mo ago

MITSardine2mo ago

There's a major difference between chess and scientific research: setting the objectives is itself part of the work.

In chess, there's a clear goal: beat the game according to this set of unambiguous rules.

Research is very much a social game, and I think replacing it with something run by LLMs (or other automatic process) is much more than a technical challenge.

bluecalm2mo ago

1 more reply

TGower2mo ago

stouset2mo ago

By all means, keep betting against them.

baq2mo ago

IOW respect the trend line.

blt2mo ago

And their predictions about Go were wrong, because they thought the algorithm would forever be α-β pruning with a weak value heuristic

NitpickLawyer2mo ago

> With a chess engine, you could ask any practitioner in the 90's what it would take to achieve "Stage 4" and they could estimate it quite accurately as a function of FLOPs and memory bandwidth.

And the same practitioners said right after deep blue that go is NEVER gonna happen. Too large. The search space is just not computable. We'll never do it. And yeeeet...

guluarte2mo ago

so we are going back to physical labor then

empath752mo ago

We are already at stage 3 for software development and arguably step 4

zarzavat2mo ago

1 more reply

bpodgursky2mo ago

RandomLensman2mo ago

I don't know enought about theoretical physics: what makes it a significant contribution there?

terminalbraid2mo ago

https://arxiv.org/list/hep-th/new

Maybe to characterize it better, physicists were not holding their breath waiting for this to get done.

1 more reply

epolanski2mo ago

Not every contribution has immediate impact.

1 more reply

outlace2mo ago

I never said LLMs will not be able to do X. I gave my summary of the article and my anecdotal experiences with LLMs. I have no LLM ideology. We will see what tomorrow brings.

nozzlegear2mo ago

> We're talking about significant contributions to theoretical physics.

famouswaffles2mo ago

1 more reply

bpodgursky2mo ago

If a helicopter drops someone off on the top of Mount Everest, it's reasonable to say that the helicopter did the work and is not just a tool they used to hike up the mountain.

1 more reply

emil-lp2mo ago

"GPT did this". Authored by Guevara (Institute for Advanced Study), Lupsasca (Vanderbilt University), Skinner (University of Cambridge), and Strominger (Harvard University).

Probably not something that the average GI Joe would be able to prompt their way to...

I am skeptical until they show the chat log leading up to the conjecture and proof.

Sharlin2mo ago

buttered_toast2mo ago

I would interpret it as implying that the result was due to a lot more hand-holding that what is let on.

Was the initial conjecture based on leading info from the other authors or was it simply the authors presenting all information and asking for a conjecture?

Did the authors know that there was a simpler means of expressing the conjecture and lead GPT to its conclusion, or did it spontaneously do so on its own after seeing the hand-written expressions.

These aren't my personal views, but there is some handwaving about the process in such a way that reads as if this was all spontaneous involvement on GPTs end.

But regardless, a result is a result so I'm content with it.

1 more reply

lamontcg2mo ago

That's kinda the whole point.

SpaceX can use an optimization algorithm to hoverslam a rocket booster, but the optimization algorithm didn't really figure it out on its own.

The optimization algorithm was used by human experts to solve the problem.

1 more reply

slopusila2mo ago

hey, GPT, solve this tough conjecture I've read about on Quanta. make no mistakes

1 more reply

jmalicki2mo ago

"Grad Student did this". Co-authored by <Famous advisor 1>, <Famous advisor 2>, <Famous advisor 3>.

Is this so different?

sejje2mo ago

The Average Joe reads at an 8th grade level. 21% are illiterate in the US.

LLMs surpassed the average human a long time ago IMO. When LLMs fail to measure up to humans, it's that they fail to measure up against human experts in a given field, not the Average Joe.

We are surrounded by NPCs.

famouswaffles2mo ago

The paper has all those prominent institutions who acknowledge the contribution so realistically, why would you be skeptical ?

kristopolous2mo ago

they probably also acknowledge pytorch, numpy, R ... but we don't attribute those tools as the agent who did the work.

I know we've been primed by sci-fi movies and comic books, but like pytorch, gpt-5.2 is just a piece of software running on a computer instrumented by humans.

2 more replies

Refreeze52242mo ago

Their point is, would you be able to prompt your way to this result? No. Already trained physicists working at world-leading institutions could. So what progress have we really made here?

1 more reply

slibhb2mo ago

What's the distinction between "first principles" and "existing things"?

I'm sympathetic to the idea that LLMs can't produce path-breaking results, but I think that's true only for a strict definition of path-breaking (that is quite rare for humnans too).

hellisad2mo ago

Hmm feels a bit trivializing, we don't know exactly how difficult it was to come up with the generic set of equations mentioned from the human starting point.

Also "LLM’s can make new things when they are some linear combination of existing things"

Doesn't really mean much, what is a linear combination of things you first have to define precisely what a thing is?

epolanski2mo ago

lovecg2mo ago

javier1234543212mo ago

8note2mo ago

the annoying part is that with tool calls, a lot of those hours is time spent on netowrk round trips.

over long periods of time, checklists are the biggest thing, so the LLM can track whats already done and whats left. after a compact, it can pull the relevant stuff back up and make progress.

having some level or hierarchy is also useful - requirements, high level designs, low level designs, etc

anon2912mo ago

tedd4u2mo ago

What does a 12-hour solution cost an OpenAI customer?

int_19h2mo ago

$200/month would cover many such sessions every month.

revlsas2mo ago

No because open source models are close behind

Compute costs will fall drastically for existing models

But it's likely that frontier models of the future won't be released to the public at all, because they'll be too good

1 more reply

sathish3162mo ago

> I haven’t been to get them to do something totally out of distribution yet from first principles.

But it’s good to know that it’s breaking new ground in Theoretical Physics!

FranklinJabar2mo ago

MITSardine2mo ago

LLMs can write theorems, but can they come up with meaningful definitions?

FranklinJabar2mo ago

1 more reply

acchow2mo ago

> In my experience LLM’s can make new things when they are some linear combination of existing things

It seems to me that all “new ideas” are basically linear combinations of existing things with exceeding rare exceptions…

Maybe Godel’s Incompleteness?

Darwinian evolution?

General Relativity?

Buddhist non-duality?

malshe2mo ago

My physics professor once claimed that imagination is just mental manipulation of past experiences. I never thought it was true for human beings but for LLMs it makes perfect sense.

zaphirplane2mo ago

I must be a Luddite, how do you have a model working for 12 hours on a problem. Mine is ready with an answer and always interrupts to ask confirmation or show answer

arjie2mo ago

DeathArrow2mo ago

>LLM’s can make new things when they are some linear combination of existing things

Aren't most new things linear combinations of existing things (up to a point)?

waynesonfire2mo ago

> It took 12 hours for GPT pro to do this

Thanks for the summary; but this is a huge hand-wave. was GPT Pro just spinning for 12 hours and returend 42?!

Sparkyte2mo ago

AI cough LLMs don't discover things they simply surface information that already existed.

slibhb2mo ago

You're assuming there aren't "new things" latent inside currently existing information. That's definitely false, particulary for math/physics.

Sparkyte2mo ago

No it isn't false. If it is new it is novel, novel because it is known to some degree and two other abstracted known things prove the third. Just pattern matching connecting dots.

1 more reply

ctoth2mo ago

[0]: https://slatestarcodex.com/2019/02/19/gpt-2-as-step-toward-g...

bottlepalm2mo ago

outlace2mo ago

But I’ve successfully made it build me a great Poker training app, a specific form that also didn’t exist, but the ingredients are well represented on the internet.

And I’m not trying to imply AI is inherently incapable, it’s just an empirical (and anecdotal) observation for me. Maybe tomorrow it’ll figure it out. I have no dogmatic ideology on the matter.

fpgaminer2mo ago

> Is every new thing not just combinations of existing things?

If all ideas are recombinations of old ideas, where did the first ideas come from? And wouldn't the complexity of ideas be thus limited to the combined complexity of the "seed" ideas?

Therefore there is still the necessity to explore the space manually, even if you're using these idea vectors to give you starting points to explore from.

All this to say: Every new thing is a combination of existing things + sweat and tears.

drdeca2mo ago

Any countable group is a quotient of a subgroup of the free group on two elements, iirc.

I don’t think that says that words for complicated ideas aren’t like, more complicated?

djeastm2mo ago

>If all ideas are recombinations of old ideas, where did the first ideas come from?

Ideas seem to just be our abstractions of neural impulses from deep in evolution.

red75prime2mo ago

"Sweat and tears" -> exploration and the training signal for reinforcement learning.

D-Machine2mo ago

> What does out of distribution even mean?

There are in fact ways to directly quantify this, if you are training e.g. a self-supervised anomaly-detection model.

amelius2mo ago

Just wait until LLMs are fast and cheap enough to be run in a breadth first search kind of way, with "fuzzy" pruning.

bamboozled2mo ago

All you have to do is see "openai.com" in the submission URL to know it's bullshit.

mirsadm2mo ago

square_usual2mo ago

loire2802mo ago

threethirtytwo2mo ago

I think it's more insidious then this.

It's easy to fall into a negative mindset because the justification is real and what we see is just the beginning.

Obviously we are not at a point where developers aren't needed. But One developer can do more. And that is a legitimate reason to higher less developers.

anon848736282mo ago

What does "pointy haired" mean? (Presumably not literally?)

FeteCommuniste2mo ago

The "pointy-haired boss" was a character in the Dilbert comics, an archetypical know-nothing manager who spews jargon, jumps on trends, and takes credit for ideas that aren't his.

brokencode2mo ago

Crazy that an honest question like this gets downvoted.

I honestly think the downvote button is pretty trash for online communities. It kills diversity of thought and discussion and leaves you with an echo chamber.

If you disagree with or dislike something, leave a response. Express your view. Save the downvotes for racism, calls for violence, etc.

1 more reply

dakolli2mo ago

I suspect this is going to backfire on them in one of two ways.

1. French Revolution V2, they all get their heads cutoff in 15 years, or an early retirement on a concrete floor.

2. Many decisions makers will make fools of themselves, destroy their businesses and come begging to the working class for our labor, giving the working class more bargaining chips in the process.

Either outcome is going to be painful for everyone, lets hope people wake up before we push this dumb experiment too far.

janalsncm2mo ago

I’m reminded of Dan Wang’s commentary on US-China relations:

https://danwang.co/ (2025 Annual letter)

1 more reply

lovecg2mo ago

Toutouxc2mo ago

kilroy1232mo ago

I agree with you on all of it.

But _what if_ they work out all of that in the next 2 years and it stops needing constant supervision and intervention? Then what?

4 more replies

nprateem2mo ago

Yes and look how far we've come in 4 years. If programming has another 4 that's all it has.

But whatever. It's the trajectory you should be looking at.

threethirtytwo2mo ago

Have you ever thought about the fact that 2 years ago AI wasn't even good enough to write code. Now it's good enough.

Right now you state the current problem is: "requiring my constant supervision and frequent intervention and always trying to sneak in subtle bugs or weird architectural decisions"

We need to be worried, LLMs are only getting better.

1 more reply

threethirtytwo2mo ago

I'm all for this. But it's the delusion and denialism of people not wanting to face reality.

dakolli2mo ago

Yeah but you know what, this is a complete psyop.

D-Machine2mo ago

famouswaffles2mo ago

>It's interesting to me that whenever some new result in AI use comes up, there's always a flood of people who come in to gesticulate wildly that that the sky is falling and AGI is imminent.

D-Machine2mo ago

> Your framing is weirdly disingenuous

1 more reply

Leynos2mo ago

Can we not just say "this is pretty cool" and enjoy it rather than turning it into a fight?

threethirtytwo2mo ago

Anecdotally I would say 50% of developers never code things by hand anymore. That is revolutionary in itself and by the statement itself it has already literally happened.

krackers2mo ago

Because most times results like this are overstated (see the Cursor browser thing, "moltbook", etc.). There is clear market incentive to overhype things.

newswasboring2mo ago

> it's closer to "simplify and propose a more general form for a previously worked out sequence of amplitudes"

How is this different than a new result? Many a careers in academia are built on simplifying mathematics.

tclancy2mo ago

> It's interesting to me that whenever a new breakthrough in AI use comes up,

cman14442mo ago

If the AI were instead human, that human would almost certainly be cited as a co-author, contributor, or whatever.

austinwade2mo ago

hgfda2mo ago

It is not only the the peanut gallery that is skeptical:

https://www.math.columbia.edu/~woit/wordpress/?p=15362

Let's wait a couple of days whether there has been a similar result in the literature.

gjm112mo ago

etraql2mo ago

You reached your goal though and got that comment downvoted.

1 more reply

epolanski2mo ago

It's an obvious tension created by the title.

The reality is: "GPT 5.2 found a more general and scalable form of an equation, after crunching for 12 hours supervised by 4 experts in the field".

Interesting? Sure. Speaks highly of AI? Yes.

Does it suggest that AI is revolutionizing theoretical physics on its own like the title does? Nope.

jdthedisciple2mo ago

> GPT 5.2 after crunching 12 hours mathematical formulas supervised and prompted by 4 experts in the field

Yet, if some student or child achieved the same – under equal supervision – we would call him the next Einstein.

epolanski2mo ago

We would not call him at all because it would be one of the many millions that went through projects like this for their thesis as physics or math graduates.

One of my best friends in his bachelor thesis had solved a difficult mathematical problem in planet orbits or something, and it was just yet another random day in academia.

And she didn't solve it because she was a genius but because there's a bazillions such problems out there and little time to look at them and focus. Science is huge.

2 more replies

suddenlybananas2mo ago

Yes and if a 1 year old could multiply 1357329 by 28384743, I'd be impressed and yet I still wouldn't be impressed by a calculator doing it.

1 more reply

MatejKafka2mo ago

D-Machine2mo ago

That, and the massive investment and financial incentives means that the counter-reaction is really quite rational (but still potentially unwarranted, in some/many practical cases).

NegativeK2mo ago

There is no loud, moderate voice. It makes me very tired of the blasting rhetoric that invades _every_ space.

MatejKafka2mo ago

https://simonwillison.net/ is a pretty loud and moderate voice in the community. Also active on Lobste.rs: https://lobste.rs/~simonw

But agree that there's an irrational level of tribalism on both sides.

ijidak2mo ago

Reminds me of the famous quote that it's hard to get someone to understand something when their job depends on not understanding it.

It reminds me of an episode of Star Trek, "The Measure of a Man" I think it's called, where it is argued that Data is just a machine and Picard tries to prove that no he is a life form.

And the challenge is, how do you prove that?

Every time these LLMs get better, the goalposts move again.

It makes me wonder, if they ever did become sentient, how would they be treated?

It's seeming clear that they would be subject to deep skepticism and hatred much more pervasive and intense than anything imagined in The Next Generation.

otabdeveloper42mo ago

> why this isn't actually a win for LLMs

Wait, so this is now a contest (or maybe war) that LLMs are supposed to win?

Wild.

Bengalilol2mo ago

I have no doubts about that.

What I question here is OpenAI's article: it could be way more generous towards the reader.

bjackman2mo ago

The discourse about AI is definitely the worst I've ever experienced in my life.

CrimsonRain2mo ago

Clankists feel threatened. That's the gist of it.

_giorgio_2mo ago

Always moving targets.

They never surrender.

D-Machine2mo ago

"They're moving the goalposts" is increasingly the autistic shrieking of someone with no serious argument or connection to reality whatsoever.

_giorgio_2mo ago

I have no idea what you're talking about.

threethirtytwo2mo ago

Yeah it's pervasive. It's also delusional.

cxvwK2mo ago

[flagged]

threethirtytwo2mo ago

yep. because I find the bias overly negative.

Let me state the negative things about LLMs: they hallucinate. They are not as reliable as humans. They can lie. They can be deceptive.

The anti-hype is predictable. It's when something becomes too pervasive and too popular and overused people start talking shit and ignoring the on the ground reality.

1 more reply

Davidzheng2mo ago

"An internal scaffolded version of GPT‑5.2 then spent roughly 12 hours reasoning through the problem, coming up with the same formula and producing a formal proof of its validity."

mmaunder2mo ago

knicholes2mo ago

https://developers.openai.com/cookbook/articles/codex_exec_p... might be what you're looking for

slopusila2mo ago

after those 30 min you can manually ask it again to continue working on the problem

Davidzheng2mo ago

cpard2mo ago

AI can be an amazing productivity multiplier for people who know what they're doing.

supern0va2mo ago

>AI can be an amazing productivity multiplier for people who know what they're doing.

>[...]

>The "AI replaces humans in X" narrative is primarily a tool for driving attention and funding.

You're sort of acting like it's all or nothing. What about the the humans that used to be that "force multiplier" on a team with the person guiding the research?

If a piece of software required a team of ten to people, and instead it's built with one engineer overseeing an AI, that's still 90% job loss.

guluarte2mo ago

matwood2mo ago

> If a piece of software required a team of ten to people, and instead it's built with one engineer overseeing an AI, that's still 90% job loss.

Yes, but this assumes a finite amount of software that people and businesses need and want. Will AI be the first productivity increase where humanity says ‘now we have enough’? I’m skeptical.

kaibee2mo ago

> Yes, but this assumes a finite amount of software that people and businesses need and want.

A lot of software exists because humans are needy and kinda incompetent, but we needed to enable to process data at scale? Like, would you build SAP as it is today, for LLMs?

throwaway7432mo ago

cpard2mo ago

there's 90% job loss assuming that this is a zero sum type of thing where humans and agents compete for working on a fixed amount of work.

ramathornn2mo ago

Fully agree with your og comment and I didn’t get the same read as the person above at all.

Human-Cabbage2mo ago

The optimistic case is that instead of a team of 10 people working on one project, you could have those 10 people using AI assistants to work on 10 independent projects.

bagacrap2mo ago

Well those Uber drivers are usually pretty quick to note that Uber is not their job, just a side hustle. It's too bad I won't know what they think by then since we won't be interacting any more.

jonahx2mo ago

> The "AI replaces humans in X" narrative is primarily a tool for driving attention and funding.

But there's no logical reason that needs to be the case. Once it's not, humans will be replaced.

cpard2mo ago

The reason there is a marketing opportunity is because, to your point, there is a legitimate concern. Marketing builds and amplifies the concern to create awareness.

When the systems turn into something trivial to manage with the new tooling, humans build more complex or add more layers on the existing systems.

krethh2mo ago

jonahx2mo ago

1 more reply

decidu0us90342mo ago

cpard2mo ago

sure, I won't argue on this, although it did manage to deliver the marketing value they were looking for, at the end their goal was not to replace gcc but to make people talk about AI and Anthropic.

BrouteMinou2mo ago

Deliver what exactly? False hope and lies?

https://github.com/anthropics/claudes-c-compiler/issues/228

elzbardico2mo ago

Actually, the results were far worse and way less impressive than what the media said.

cpard2mo ago

the c compiler results or the physics results this post is about?

elzbardico2mo ago

The C compiler.

1 more reply

NewsaHackO2mo ago

His point is going to be some copium like since the c compiler is not as optimized as gcc, it was not impressive.

1 more reply

hananova2mo ago

AI is indeed an amazing productivity multiplier! Sadly that multiplier is in the range [0; 1).

kylehotchkiss2mo ago

> for people who know what they're doing.

I worry we're not producing as many of those as we used to

blks2mo ago

We will be producing them even less. I fear for the future graduates, hell even for school children, who are now uncontrollably using ChatGPT for their homework. Next level brainrot

fragmede2mo ago

Right. If it hadn't been Nicholas Carlini driving Claude, with his decades of experience, there wouldn't be a Claude c compiler. It still required his expertise and knowledge for it to get there.

1 more reply

ece2mo ago

Everytime I see a RL startup, a data startup or even a startup focused on a specific vertical, I think this exact same thing about LLMs.

nilkn2mo ago

uh_uh2mo ago

Would you be similarly pedantic if a high-schooler did the same?

nilkn2mo ago

The text of the post is much more honest. The title is where the dishonesty is.

lupsasca2mo ago

turzmo2mo ago

(35)-(38) are the AI-simplified versions of (29)-(32). Those earlier formulae look formidable to simplify by hand, but they are also the sort of thing you'd try to use a computer algebra system for.

I'm willing to (begrudgingly) admit the possibility for AI to do novel work, but this particular result does not seem very impressive.

radioactivist2mo ago

Insanity2mo ago

They also claimed ChatGPT solved novel erdös problems when that wasn’t the case. Will take with a grain of salt until more external validation happened. But very cool if true!

famouswaffles2mo ago

Well they (OpenAI) never made such a claim. And yes, LLMs have made unique solutions/contributions to a few erdos problems.

smokel2mo ago

emil-lp2mo ago

It didn't solve it, it simply found that it had been solved in a publication and that the list of open problems wasn't updated.

Davidzheng2mo ago

https://github.com/teorth/erdosproblems/wiki/AI-contribution... may be useful

3 more replies

vonneumannstan2mo ago

Wasnt that like some marketing bro? This is coming out the front door with serious physicists attached.

castigatio2mo ago

Snow + stick + need to clean driveway = snow shovel. Snow shovel + hill + desire for fun = sled

hiAndrewQuinn2mo ago

Happy Valentine's day to those who celebrate btw <3

qnleigh2mo ago

mym19902mo ago

alansaber2mo ago

I find it hard not to agree with this line of thinking (albeit will be less than 1%)

elashri2mo ago

JanisErdmanis2mo ago

crorella2mo ago

The preprint: https://arxiv.org/abs/2602.12176

vbarrielle2mo ago

w37_5observer2mo ago

computator2mo ago

[1] https://en.wikipedia.org/wiki/List_of_physical_constants

kilroy1232mo ago

My dream is that powerful agents trawl through all the research papers looking for diamonds in the rough.

They evaluate papers that look interesting and should be looked at more deeply. Then, research ideas as much as they can.

Then flag for human review the real possible breakthroughs.

dakolli2mo ago

They literally cannot do this, they are not that much different than autocomplete that was in your email 10 years ago, with some transformer NN magic. Stop believing the hype.

refulgentis2mo ago

Why not?

fsh2mo ago

lich_king2mo ago

ChrisClark2mo ago

"Please derive and unify all of quantum mechanics and general relativity starting only with the Fine Structure Constant."

;)

singularfutur2mo ago

PlatoIsADisease2mo ago

I'll read the article in a second, but let me guess ahead of time: Induction.

Okay read it: Yep Induction. It already had the answer.

Don't get me wrong, I love Induction... but we aren't having any revolutions in understanding with Induction.

globalnode2mo ago

Its frustrating, because if it was actually something new (as in original) then we could start talking about AGI, but its never something new.

smj-edison2mo ago

[1] https://distill.pub/2017/research-debt/

globalnode2mo ago

gaigalas2mo ago

I like the use of the word "derives". However, it gets outshined by "new result" in public eyes.

I expect lots of derivations (new discoveries whose pieces were already in place somewhere, but no one has put them together).

major4x2mo ago

Can't help not thinking of https://en.wikipedia.org/wiki/Bogdanov_affair

sciencejerk2mo ago

An internal scaffolded version of GPT‑5.2...

Any reason to believe that public versions of GPT-5.2 could have accomplished this task? "scaffolded" is a very interesting word choice

giantg22mo ago

GPT-5.2 can't even process a 1-2 page PDF and give me a subset of the content as a formatted word doc. Nor can it even be truthful about it's own capabilities.

kaelandt2mo ago

Misleading title, it's more like GPT-5.2 derives the generalization of a formula that physicists conjectured. Not really related to physics

vonneumannstan2mo ago

Interesting considering the Twitter froth recently about AI being incapable in principle of discovering anything.

baq2mo ago

Anything but recent.

cagz2mo ago

My personal opinion is that things will only accelerate from here.

nxobject2mo ago

Man, I'd be more worried about the impact of this on Mathematica than actual humans.

emil-lp2mo ago

Mathematica guarantees correctness. It should be safe for a while.

krackers2mo ago

Tell that to the various confirmed computational bugs in mathematica :)

https://mathematica.stackexchange.com/questions/tagged/bugs

carlob2mo ago

I do wonder if throwing a similar amount of computational power behind old school rule based algorithms like the ones in Mathematica's FullSimplify would have yielded similar results.

amai2mo ago

"There is no question that dialogue between physicists and LLMs can generate fundamentally new knowledge."

That is what one of the author says. This doesn't quite fit to the headline of the post.

another_twist2mo ago

alansaber2mo ago

Deepseek wrote a decent paper on this https://github.com/deepseek-ai/DeepSeek-Math-V2/blob/main/De...

user39393822mo ago

snarky1232mo ago

So wait,GPT found a formula that humans couldn't,then the humans proved it was right? That's either terrifying or the model just got lucky. Probably the latter.

JasonADrury2mo ago

> found a formula that humans couldn't

Couldn't is an immensely high bar in this context, didn't seems more appropriate and renders this whole thing slightly less exciting.

vessenes2mo ago

I'd say "couldn't in 20 hours" might be more defensible. Depends on how many humans though. "couldn't in 20 GPT watt-hours" would give us like 2,000 humans or so.

davidmurdoch2mo ago

If only humans scaled like that

1 more reply

emp173442mo ago

Cynically, I wonder if this was released at this time to ward off any criticism from the failure of LLMs to solve the 1stproof problems.

ares6232mo ago

I guess the important question is, is this enough news to sustain OpenAI long enough for their IPO?

danny_codes2mo ago

Well it’ll be at least a whole month before some other company announces similar capability. The moat will hold!

dyauspitr2mo ago

I believe Gemini holds the moat now.

the_king2mo ago

This it is very impressive. But scrolling through the preprint, I wouldn't call any of it elegant.

hackable_sand2mo ago

Wonderful. Where's my money

dadb00ty2mo ago

But what does it all mean, Basil?

baalimago2mo ago

Well, anyone can derive a new result in anything. Question is most often if the result makes any sense

pruufsocial2mo ago

All I saw was gravitons and thought we’re finally here the singularity has begun

nsxwolf2mo ago

Warp drive next.

sfmike2mo ago

5.2 is the best model on the market.

brcmthrowaway2mo ago

End times approach..

1 more reply

Noaidi2mo ago

"Let's put 'GPT' in our paper to get clicks!"?

getnormality2mo ago

I'll believe it when someone other than OpenAI says it.

Not saying they're lying, but I'm sure it's exaggerated in their own report.

longfacehorrace2mo ago

Car manufacturers need to step up their hype game...

New Honda Civic discovered Pacific Ocean!

New F150 discovers Utah Salt Flats!

Sure it took humans engineering and operating our machines, but the car is the real contributor here!

anonym292mo ago

sToChAsTiC pArRoTs CaNt PrOdUcE aNyTHiNg NeW!!!!1

pear012mo ago

If a researcher uses LLM to get a novel result should the llm also reap the rewards? Could a nobel prize ever be given to a llm or is that like giving a nobel to a calculator?

jtrn2mo ago

I wrote up a more detailed critique in a substack: https://jonnordland.substack.com/p/the-psychologists-case-ag...

mrguyorama2mo ago

Don't lend much credence to a preprint. I'm not insinuating fraud, but plenty of preprints turn out to be "Actually you have a math error here", or are retracted entirely.

Also, last I checked, MATLAB wasn't a trillion dollar business.

https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.83...

My_Name2mo ago

j / k navigate · click thread line to collapse