Grace Hopper's Revenge (opens in new tab)

(thefuriousopposites.com)

67 pointsashirviskas3mo ago56 comments

56 comments

45 comments · 17 top-level

nr3783mo ago· 7 in thread

The data doesn't well support the claim that FP is best. Elixir tops the table at 97.5%, but C# (88.4%) is OOP and scores almost identically to Racket (88.9%), and Ruby (81.0%) and Java (80.9%) both outscore Scala (78.4%), which is explicitly functional. If FP were the driver, Scala should beat those languages, but it doesn't.

It's tempting to argue that a more constrained language helps, but Rust (62.8%) vs Elixir (97.5%) is an interesting data point here. Both are highly constrained, but in different directions. Elixir's constraints narrow the solution space because you can't mutate, you can't use loops, and you must pattern match, so every constraint eliminates options and funnels you toward fewer valid solutions that the LLM has to search through. Rust adds another constraint that must independently be satisfied on top of solving the actual problem, where the borrow checker doesn't eliminate approaches but adds a second axis of correctness the LLM has to get right simultaneously.

Overall, it seems like languages with strong conventions and ecosystems that narrow the solution space beat languages where there's a thousand ways to do something. Elixir has one build tool, one formatter, one way to do things. C#, Kotlin, and Java have strong ceremony and convention that effectively narrow how you write a program. Meanwhile JS, Python, PHP, and Perl offer endless choices, fragmented ecosystems, and rapidly shifting idioms, and they cluster at the bottom of the table.

klibertp3mo ago

Scala is explicitly multiparadigm and offers a lot of advanced OOP features. It also had a Python-like (though reportedly better handled) 2 -> 3 transition, which deprecated some things, removed others, and added a bunch of new ones. Scala has always been complex, and right now it's also chaotic. It's a wonder the models can get that high a score with it, honestly.

Racket is a similarly large PL, with many abstractions built on the metaprogramming primitives it offers. Without looking at the generated code, it's hard to say anything, but I suspect the high score despite that might be because of the Scheme core of Racket: `racket/base` is a much smaller language than `racket`, so if the LLMs keep to it, it might narrow the solution space enough to show different results.

In general, I think you're half-right: the "solution space" size is a factor, but so is its shape - ie. which features specifically are offered and how they interact. A more compact and cohesive language design should yield better results than just a reduced surface area. C is not a huge language, but the features it offers don't lend themselves to writing correct code much. Elixir is both relatively small and strongly steers a programmer towards safer idioms. Racket is big, but the advanced features are opt-in, while the baseline (immutable bindings, pure functions, expressive contracts) is similar to Elixir. Python is both huge and complex; "there's one obvious way to do it" has always been a bit of a joke. Rust is incredibly complex - the idea is that the tooling should allow you to handle that complexity easily, but that requires agents; one-shotting solutions there won't work as well.

ashirviskasOP3mo ago

What if it is the quality of data? Internet is full of terrible python/js, but probably not Elixir.

deflator3mo ago

Seems plausible. I used to refer to StackOverflow before LLMs and a good amount of the examples there were flawed code presented as working. If the LLM had less junk in its training then it might benefit even though the volume of training on that language is lower.

Sharlin3mo ago

If we assume that the amount of training data matters at least a bit (which is a very reasonable assumption), I wouldn’t immediately discard the functional hypothesis. Scala’s score is almost equal to Java’s even though there’s probably something like two orders of magnitude less Scala than Java code in the wild. Similarly with C# and Racket.

nr3783mo ago

Yep I think you can reasonably argue that immutability + strong conventions are the most important dimensions (as opposed to FP vs. OOP, as much as I like FP and dislike OOP):

Immutable by convention + Strong conventions: 91.3% - Elixir 97.5%, Kotlin 90.5%, Racket 88.9%, C# 88.4%

Immutable by convention + Fragmented: 78.4% - Scala 78.4% (n=1)

Mutable + Strong conventions: 77.5% - Ruby 81.0%, Swift 78.5%, Julia 78.5%, Dart 78.0%, Go 71.7%

Mutable + Fragmented: 67.9% - Java 80.9%, R 75.8%, C++ 75.8%, Shell 72.9%, Python 65.3%, Perl 64.5%, TS 61.3%, JS 60.9%, PHP 53.8%

(my grouping is somewhat subjective)

f1shy3mo ago

I agree with you, but, from the article: "The amount of training data doesn’t matter as much as we thought. Functional paradigms transfer well"

Anyway, I tend to think you are right, and the article is wrong in that sentence. (Or I misinterpreted something?)

I think both the quantity and quality of that has a big influence in the results.

1 more reply

nolist_policy3mo ago

TFA's theory also doesn't explain why C++ (75.8%) beats Python, JavaScript and Rust.

stabbles3mo ago· 7 in thread

The TL;DR: code should be easy to audit, not easy to write for humans.

The rest is AI-fluff:

> This isn't about optimizing for humans. It's about infrastructure

> But the bottleneck was never creation. It was always verification.

> For software, the load-bearing interface isn't actually code. Code is implementation.

> It's not just the Elixir language design that's remarkable, it's the entire ecosystem.

> The 'hard' languages were never hard. They were just waiting for a mind that didn't need movies.

tyleo3mo ago

To put it another way: this article isn’t about the AI fluff, it’s about the two sentences at the top the author wrote themselves. ;)

zeristor3mo ago

Perhaps we need an AI to human transformer to remove the AI fluff?

zeristor3mo ago

It really is AI fluff.

Are people starting to write and talk in this manner, I see so many YouTube videos where you can see a person reading an AI written text, its one thing if the AI wrote it, but another if the human wrote it in the style of an AI.

As someone pointed out to me the way an AI writes text can be changed, so it is less obvious, its just that people don't tend to realise that.

ModernMech3mo ago

Someone had one of those AI videos on in the background and, I can’t explain it, the ordering of the words is like nails on a chalkboard to me. I’m starting to have a visceral physiological response to AI prose that makes it actually painful to listen to.

The video was a biography about some Olympian, and I could tell the prompt included some facts about her wanting to be a tap dancer as a kid, because the video kept going back to that fact constantly. Every few sentences it would reference “that kid who wanted to be a tap dancer”. By the 6th time it brought up she wanted to be a tap dancer I was ready to scream.

InkCanon3mo ago

Whenever I see a sentence of the form:

"X isn't A, it's (something opposite A)" I twitch involuntarily.

arrowsmith3mo ago

It's even infecting the highest levels of government:

https://www.pimlicojournal.co.uk/p/mps-are-almost-certainly-...

dist-epoch3mo ago

Man you are bad at TL;DR;-ing, you completely left out the main point article makes comparing stateful/mutating object oriented programming that humans like and pure functional oriented programing that presumably according to author LLMs thrive in.

jna_sh3mo ago· 4 in thread

> The amount of training data doesn’t matter as much as we thought.

Seems a huge assumption to me. From the data one could equally conclude that JavaScript and Python have lower code quality _because_ the quantity of training data, e.g. more code written by less experienced developers

zephen3mo ago

Don't worry! It'll only get better, as the amount of ouroboros training data explodes.

Weren't we taught that recycling is good?

dormento3mo ago

Seems to be that just like scrum, AI eventually turns terrible code into passable code, and good code into passable code.

We're getting drowned by "good enough". Not "good" mind you, just "good enough".

1 more reply

bsder3mo ago

> Don't worry! It'll only get better, as the amount of ouroboros training data explodes.

Well, the AI can at least be constrained to "the code actually compiles, runs, and produces the correct output."

The number of internet answers that can't pass that bar is distressingly high.

1 more reply

mbonnet3mo ago

in order, is is best to reduce, reuse, THEN recycle.

1 more reply

skywhopper3mo ago· 4 in thread

This article takes a very tiny, questionable bit of data and extrapolates a lot of iffy assertions.

In general I’m tired of the “humans need never, and should never look at the code” LLM triumphalism articles. Do these folks ever work with real systems, I wonder.

dist-epoch3mo ago

I remember when "real programmers" were supposed to look at the assembly code generated by compilers because it was bloated, inefficient, and totally unsuitable to use in a real system.

Cue in "non-determinism" retort.

tgv3mo ago

Hardware restrictions might have contributed to that. Anyway, analogs and metaphors do not prove what they sneakily try to imply. They might help thinking about a problem, but they leave out the actual argument, and in this case, the jump is substantial.

chrisrhoden3mo ago

I think the problem is less determinism than predictability. Hashing algorithms are deterministic.

Will people start .gitignore-ing their src directories and only save prompts?

2 more replies

keybored3mo ago

That you anticipated a retort isn’t enough. You also have to refute it.

Yeah compilers are deterministic and LLMs are not. The response to that?

The answer could very well be something like what’s in TFA namely formal verification. But an answer here is needed.

1 more reply

ashirviskasOP3mo ago· 3 in thread

I found it interesting that Elixir scores so high, but I'm not sure whether I can agree with the cause.

Bolwin3mo ago

That benchmark is useless for comparing languages because the tasks are not the same across languages

gostsamo3mo ago

how can you argue with so many assertive sentences in the article? they leave no space for critical thinking.

ashirviskasOP3mo ago

I'll admit, my brain was DDoSed by the article and I thought that maybe posting it here will get us someone with more DDoS proof brain to dissect it.

arbirk3mo ago· 2 in thread

1) AutoCodeBranch does not present similar problems, so you can't compare the Upper Bound metric across languages. Actually the Elixir tests are transmuted from other languages.

2) The Tesla section is interesting. I'm not saying that you are wrong, just that their methods have not produced the promised results yet

3) Wireless humanoid robots is a bad platform because we don't have the hardware to support them. Both battery density and compute efficiency is too low currently to support freestanding robots. Rip Roomba - long live its legacy

evan_a_a3mo ago

>2) The Tesla section is interesting. I'm not saying that you are wrong, just that their methods have not produced the promised results yet

The flaw in Tesla's engineering choice to rely on a camera based system for self driving is that it is extremely difficult to approximate human vision with cameras alone. The author also does not mention this and instead assumes that "camera == human eye" which is not true.

>3) Wireless humanoid robots is a bad platform because we don't have the hardware to support them. Both battery density and compute efficiency is too low currently to support freestanding robots. Rip Roomba - long live its legacy

Boston Dynamics already has Atlas, with a 4 hour battery life and the ability to self-swap. That is already better than a human since it can presumably work non-stop for its entire runtime. Plus battery technology and compute efficiency are both still improving.

https://bostondynamics.com/products/atlas/

yfw3mo ago

Its hard to understand the articles argument for why cameras are correct when lidar systems are doing well

ChrisMarshallNY3mo ago· 1 in thread

It seems that this is really saying that Functional Programming is The Way.

Maybe so, but I find the right tool, for the right job, is important. Trying to use a screwdriver for every task, is going to result in bruises and cuts (and gouges in the work).

I've used a pretty "chimeric" approach, for most of my career, and the LLM that I use, doesn't seem to have an issue with that. Many of its responses, seem to take a similar approach.

mtdewcmu3mo ago

You could write imperative code in Racket without much difficulty, as long as you don't mind the exclamation points.

ModernMech3mo ago

This is so annoying. First it’s obviously AI generated so it’s hard to even read. But if we get past that it’s making all kinds of uncited claims. Did Grace Hopper envision the translation layer moving directly from English to machine code? I don’t know because I can’t trust the LLM to say, and the article does not include a citation — in a piece whose central claim is that AI shifts the burden from coding to verification.

keybored3mo ago

Against Flintstone Engineering.[1] That’s great.

I don’t know about the premises here. All of these articles are written to hammer two points.

- AI is the future/AI has been here since X months ago

- There are still people who don’t believe that—to me an unfathomable position as I have personally spent five gazillion tokens on

And the supposed topic of the article is incidental to that.

But if GenAI is the future I’ll take GenAI formal verification and code generation over mindless code generation, thank you very much.

[1] https://news.ycombinator.com/item?id=47358696

WillAdams3mo ago

If the tool chain needs to be structured to facilitate people writing programs and documentation and comments, why isn't Literate Programming more popular?

http://literateprogramming.com/

Is LLM what will finally push LP into mainstream acceptance?

https://news.ycombinator.com/item?id=47300747

bronlund3mo ago

Brilliant take! We have spent so much time getting stupid people to program, that we have actually made the machines dumber :D

austin-cheney3mo ago

> I’m still seeing a decent number of people on Twitter complain occasionally that they’ve tried AI-driven coding workflows and the output is crap and they can move faster by themselves. There’s less of these people in the world of Opus 4.5 and Gemini 3 now, but they’re still there.

The article starts from a false premise: that AI assisted coding makes the code more understandable. This isn't the case. You either understand the code without AI or offload that reasoning onto the AI, at which point its not you that understands the code.

A person could argue AI writes original code more understandable at maintenance time than they could on their own. This is equally problematic for the same reason. If a person has a lesser understanding of the code at original authoring they will have a lesser understanding of the edge cases and challenges that went into the reasoning about that original code and its those thought challenges which inform the complexities of maintenance, not the simplicity of the base code.

As an analog its like being given a challenging game puzzle to solve. After realizing the game requires extended effort to reach the desired goal the person searches online for the puzzle solution. At the game's next level they encounter a more challenging puzzle, but they never solved the prior puzzle, and so cannot solve this puzzle. In effect all understanding is destroyed and they have become utterly reliant on spoon-fed solutions they cannot maintain themselves.

le-mark3mo ago

I don’t think it’s a surprise that statically typed, immutable functional languages are the easiest (most suitable?) translation target for LLMs. I am surprised Haskell wasn’t represented in the table. Ocaml and Standard ML would be interesting as well.

Is there a future in which functional languages finally see wide adoption due to this LLM suitability? I can’t say that I would object!

papa01013mo ago

Surprised to see Go scoring so low

Archit3ch3mo ago

Perhaps Elixir (and by extent FP) is great for standard business logic, the type that's represented in AutoCodeBench. But sometimes your problem domain demands[1] a more expressive/powerful language like LISP.

[1] Not because you can't solve it in C, but because you'll be fighting the language along the way.

yfw3mo ago

I dont think this author knows much about tesla and now i doubt his credibility

Chris20483mo ago

> We built objects with identity and state because that’s how we experience reality

I mean, we called them objects, but coupling related state (and functions) together seem an objectively (object-ively) way to group data, it's literally just dict-based organisation.

j / k navigate · click thread line to collapse

56 comments

45 comments · 17 top-level

nr3783mo ago· 7 in thread

klibertp3mo ago

ashirviskasOP3mo ago

What if it is the quality of data? Internet is full of terrible python/js, but probably not Elixir.

deflator3mo ago

Sharlin3mo ago

nr3783mo ago

Yep I think you can reasonably argue that immutability + strong conventions are the most important dimensions (as opposed to FP vs. OOP, as much as I like FP and dislike OOP):

Immutable by convention + Strong conventions: 91.3% - Elixir 97.5%, Kotlin 90.5%, Racket 88.9%, C# 88.4%

Immutable by convention + Fragmented: 78.4% - Scala 78.4% (n=1)

Mutable + Strong conventions: 77.5% - Ruby 81.0%, Swift 78.5%, Julia 78.5%, Dart 78.0%, Go 71.7%

Mutable + Fragmented: 67.9% - Java 80.9%, R 75.8%, C++ 75.8%, Shell 72.9%, Python 65.3%, Perl 64.5%, TS 61.3%, JS 60.9%, PHP 53.8%

(my grouping is somewhat subjective)

f1shy3mo ago

I agree with you, but, from the article: "The amount of training data doesn’t matter as much as we thought. Functional paradigms transfer well"

Anyway, I tend to think you are right, and the article is wrong in that sentence. (Or I misinterpreted something?)

I think both the quantity and quality of that has a big influence in the results.

1 more reply

nolist_policy3mo ago

TFA's theory also doesn't explain why C++ (75.8%) beats Python, JavaScript and Rust.

stabbles3mo ago· 7 in thread

The TL;DR: code should be easy to audit, not easy to write for humans.

The rest is AI-fluff:

> This isn't about optimizing for humans. It's about infrastructure

> But the bottleneck was never creation. It was always verification.

> For software, the load-bearing interface isn't actually code. Code is implementation.

> It's not just the Elixir language design that's remarkable, it's the entire ecosystem.

> The 'hard' languages were never hard. They were just waiting for a mind that didn't need movies.

tyleo3mo ago

To put it another way: this article isn’t about the AI fluff, it’s about the two sentences at the top the author wrote themselves. ;)

zeristor3mo ago

Perhaps we need an AI to human transformer to remove the AI fluff?

zeristor3mo ago

It really is AI fluff.

As someone pointed out to me the way an AI writes text can be changed, so it is less obvious, its just that people don't tend to realise that.

ModernMech3mo ago

InkCanon3mo ago

Whenever I see a sentence of the form:

"X isn't A, it's (something opposite A)" I twitch involuntarily.

arrowsmith3mo ago

It's even infecting the highest levels of government:

https://www.pimlicojournal.co.uk/p/mps-are-almost-certainly-...

dist-epoch3mo ago

jna_sh3mo ago· 4 in thread

> The amount of training data doesn’t matter as much as we thought.

zephen3mo ago

Don't worry! It'll only get better, as the amount of ouroboros training data explodes.

Weren't we taught that recycling is good?

dormento3mo ago

Seems to be that just like scrum, AI eventually turns terrible code into passable code, and good code into passable code.

We're getting drowned by "good enough". Not "good" mind you, just "good enough".

1 more reply

bsder3mo ago

> Don't worry! It'll only get better, as the amount of ouroboros training data explodes.

Well, the AI can at least be constrained to "the code actually compiles, runs, and produces the correct output."

The number of internet answers that can't pass that bar is distressingly high.

1 more reply

mbonnet3mo ago

in order, is is best to reduce, reuse, THEN recycle.

1 more reply

skywhopper3mo ago· 4 in thread

This article takes a very tiny, questionable bit of data and extrapolates a lot of iffy assertions.

In general I’m tired of the “humans need never, and should never look at the code” LLM triumphalism articles. Do these folks ever work with real systems, I wonder.

dist-epoch3mo ago

I remember when "real programmers" were supposed to look at the assembly code generated by compilers because it was bloated, inefficient, and totally unsuitable to use in a real system.

Cue in "non-determinism" retort.

tgv3mo ago

chrisrhoden3mo ago

I think the problem is less determinism than predictability. Hashing algorithms are deterministic.

Will people start .gitignore-ing their src directories and only save prompts?

2 more replies

keybored3mo ago

That you anticipated a retort isn’t enough. You also have to refute it.

Yeah compilers are deterministic and LLMs are not. The response to that?

The answer could very well be something like what’s in TFA namely formal verification. But an answer here is needed.

1 more reply

ashirviskasOP3mo ago· 3 in thread

I found it interesting that Elixir scores so high, but I'm not sure whether I can agree with the cause.

Bolwin3mo ago

That benchmark is useless for comparing languages because the tasks are not the same across languages

gostsamo3mo ago

how can you argue with so many assertive sentences in the article? they leave no space for critical thinking.

ashirviskasOP3mo ago

I'll admit, my brain was DDoSed by the article and I thought that maybe posting it here will get us someone with more DDoS proof brain to dissect it.

arbirk3mo ago· 2 in thread

1) AutoCodeBranch does not present similar problems, so you can't compare the Upper Bound metric across languages. Actually the Elixir tests are transmuted from other languages.

2) The Tesla section is interesting. I'm not saying that you are wrong, just that their methods have not produced the promised results yet

evan_a_a3mo ago

>2) The Tesla section is interesting. I'm not saying that you are wrong, just that their methods have not produced the promised results yet

https://bostondynamics.com/products/atlas/

yfw3mo ago

Its hard to understand the articles argument for why cameras are correct when lidar systems are doing well

ChrisMarshallNY3mo ago· 1 in thread

It seems that this is really saying that Functional Programming is The Way.

Maybe so, but I find the right tool, for the right job, is important. Trying to use a screwdriver for every task, is going to result in bruises and cuts (and gouges in the work).

I've used a pretty "chimeric" approach, for most of my career, and the LLM that I use, doesn't seem to have an issue with that. Many of its responses, seem to take a similar approach.

mtdewcmu3mo ago

You could write imperative code in Racket without much difficulty, as long as you don't mind the exclamation points.

ModernMech3mo ago

keybored3mo ago

Against Flintstone Engineering.[1] That’s great.

I don’t know about the premises here. All of these articles are written to hammer two points.

- AI is the future/AI has been here since X months ago

- There are still people who don’t believe that—to me an unfathomable position as I have personally spent five gazillion tokens on

And the supposed topic of the article is incidental to that.

But if GenAI is the future I’ll take GenAI formal verification and code generation over mindless code generation, thank you very much.

[1] https://news.ycombinator.com/item?id=47358696

WillAdams3mo ago

If the tool chain needs to be structured to facilitate people writing programs and documentation and comments, why isn't Literate Programming more popular?

http://literateprogramming.com/

Is LLM what will finally push LP into mainstream acceptance?

https://news.ycombinator.com/item?id=47300747

bronlund3mo ago

Brilliant take! We have spent so much time getting stupid people to program, that we have actually made the machines dumber :D

austin-cheney3mo ago

le-mark3mo ago

Is there a future in which functional languages finally see wide adoption due to this LLM suitability? I can’t say that I would object!

papa01013mo ago

Surprised to see Go scoring so low

Archit3ch3mo ago

[1] Not because you can't solve it in C, but because you'll be fighting the language along the way.

yfw3mo ago

I dont think this author knows much about tesla and now i doubt his credibility

Chris20483mo ago

> We built objects with identity and state because that’s how we experience reality

I mean, we called them objects, but coupling related state (and functions) together seem an objectively (object-ively) way to group data, it's literally just dict-based organisation.

j / k navigate · click thread line to collapse