undefined | Better HN

0 points21asdffdsa1216d ago0 comments

It really depends on the field you are in and the tasks you set and how much of it was in the training set? A webdeveloper will find it succeeding in all taks - while some c++ exotic physics simulation developer will find it lacking.

The "works for me" is telling more about the field of the LLM reviewer, then the LLM.

0 comments

6 comments · 2 top-level

wolvesechoes16d ago· 4 in thread

> while some c++ exotic physics simulation developer will find it lacking

Can confirm, but I always read I am holding it wrong.

OtomotO16d ago

You're not. People are just using a hammer to build a shed and telling you it's surely good to dig a hole too.

20k16d ago

I've consistently tried to apply LLMs to physics problems and they're utterly useless. They'll just confidently lie, or blatantly plagiarise source materials

The issue is once you hit niche physics simulations there simply isn't any training data available, so the limitations of them become incredibly apparent. Its also problematic because a field itself will contain lots of wrong information (its research!), and AI picks all this up uncritically

I thought I'd give chatgpt a quick spin on my favourite question, which is "is the adm formalism strictly equivalent to general relativity", to which it consistently gives the wrong answer

>Ah, now you’re hitting the subtlety head-on—that’s exactly where the “strict equivalence” claim needs nuance. Let’s unpack this carefully.

I don't know how anyone can stand these tools. Its just an obnoxious glazing machine that tells me I'm a genius consistently

Gemini gives a little more of a robust answer, but fails catastrophically for the question "is the bssn formalism numerically stable", where just about the entire answer is completely wrong from top to bottom. It certainly looks convincing. Its got all the right terminology. It manages to piece together the right set of words, but all the informational content is wrong, which isn't exactly a small problem

I struggle to see how these tools are of any use

sofixa16d ago

That's why there are companies specialising in AI for physics, like Emmi AI (now part of Mistral). If BMW and Airbus go on stage to talk about how they're using it for their physics simulations, it's probably at least decent.

1 more reply

otabdeveloper416d ago

> confidently lie, or blatantly plagiarise

Good enough for enterprise work tho. (Also the secret sauce to "holding LLMs right".)

monster_truck16d ago

Funny you used this example :)

I'm a month and a half deep into using it to make a traffic simulator with a bespoke physics engine that has complete drivetrain, suspension, and tire kernels. Think rally sim with an arcadey super off road presentation. It also has a full (also bespoke) webtransport stack that has held up beyond my wildest dreams. The simulation itself is capable of >500k cars. That was all complete about 2 weeks ago, the remainer of the work is integrating and optimizing the (you guessed it, also bespoke) pure synthesis sound engines for drivetrain/engine/tire/collision noise, and making pixi performant enough to actually display it all.

My biggest regret is actually accepting its choice of pixi, if I would have just trusted what I knew and done my own renderer too it'd already be finished! In the meantime I'm having fun boiling down the nonlinear continuous-ish models into fitted surrogate polynomials and regime-specific closed forms. Currently using cloud credits I was given to test the library I need to accelerate this work on CDNA3/4 cards. It's so nice to make someone else's room hot for a change

I've really enjoyed the ~3 month speedrun from "he has psychosis" to "the model did everything", yet somehow the number of people having this kind of success continues to match up with where I'd rank a given dev. There just aren't that many talented people out there and an even smaller subset of them are aiming high enough with LLMs, if at all. It's a truly awesome time to not have/need a job

E: Most of my frustration is directed at OAI, they keep fucking up the cache and usage calculations. They got a grand out of me, I'm excited to see what Deepseek does for me with the same.

j / k navigate · click thread line to collapse