undefined | Better HN

story

0 points3abiton1y ago0 comments

I keep hearing about Claude's impressive coding skills (compared to its benches) yet, not evident for me (I use the web version, not cline). Compared to 4o it's not that great.

0 comments

zurfer1y ago

My pet theory is that Sonnet was trained really cleverly on a lot code that resembles real world cases.

In our small and humble internal evals it regularly beats any other frontier models on some tasks. The shape of capability is really not intuitive/1 dimensional

saberience1y ago

I spend four to five hours coding per day and subscribe to every major LLM and Claude is still by far the best for me personally and my co workers.

phillipcarter1y ago

What are you using it for in general? IME the reason Claude pulls out ahead is that when you use it in a larger existing codebase, it keeps everything "in the style" of that codebase and doesn't veer off into weird territory like all the others.

davidee1y ago

My experience as well. Working in Scala primarily, it tends to be very good at following the constructs of the project.

Using a specific Monad-transformer regularly? It'll use that pattern, and often very well, handling all the wrapping and unwrapping needed to move data types about (at least well enough that the odd case it misses some wrapping/unwrapping is easy to spot and manage).

A custom GPT or GEM with the same source files, and those models regularly fail to maintain style and context, often suggesting solutions that might be fine in isolation but make little sense in the context of a larger codebase. It's almost like they never reliably refer to the code included in the project/GPT/GEM.

Claude on the other hand is so consistent about referring to existing artifacts that, as you approach the limit of project size (which is admittedly small) you can use up your entire 5-hour block of credits with just a few back-and-forths.

anti-soyboy1y ago

Lol no company is making money using 4o, however thanks to claude sonnet programms like Cursor are usable lol. 4o agents suck, just try it instead of talking

3abitonOP1y ago

I did try it yet for more than a week still 4o still pretty much better in terms of python coding and architecture/documentation design

throwaway3141551y ago

That doesn't match my experience at all.

Alifatisk1y ago

I can honestly tell you from my experience that Sonnet 3.5s coding skills did things no other models did right last year during the summer, this was even though the benchmarks showed that it wasn't the best performing at coding tasks.

Mekoloto1y ago

I prototyped on the weekend and started out with 4o because i had a subscription running.

After an hour and a half assed working result, i put everything into claude and it made it significant better on the first try and i had not a subscription active with claude.

3abitonOP1y ago

Really interesting, I used it today still lots of issues. Maybe my python notebook is not approach is too complicated for Sonnet? Couldn't be able to fix a custom complex seaborn plot. 4o failed too. o3-mini-high managed to do it really well on the other hand.

bamboozled1y ago

There is honestly no rhyme of reason to all these opinions, someone was telling me the other day that Claude is for sure the best, I'd say multiple people actually.

I find it concerning there is no real accurate benchmarks for this stuff that we can all agree on.

j / k navigate · click thread line to collapse

0 comments

zurfer1y ago

My pet theory is that Sonnet was trained really cleverly on a lot code that resembles real world cases.

In our small and humble internal evals it regularly beats any other frontier models on some tasks. The shape of capability is really not intuitive/1 dimensional

saberience1y ago

I spend four to five hours coding per day and subscribe to every major LLM and Claude is still by far the best for me personally and my co workers.

phillipcarter1y ago

davidee1y ago

My experience as well. Working in Scala primarily, it tends to be very good at following the constructs of the project.

anti-soyboy1y ago

Lol no company is making money using 4o, however thanks to claude sonnet programms like Cursor are usable lol. 4o agents suck, just try it instead of talking

3abitonOP1y ago

I did try it yet for more than a week still 4o still pretty much better in terms of python coding and architecture/documentation design

throwaway3141551y ago

That doesn't match my experience at all.

Alifatisk1y ago

Mekoloto1y ago

I prototyped on the weekend and started out with 4o because i had a subscription running.

After an hour and a half assed working result, i put everything into claude and it made it significant better on the first try and i had not a subscription active with claude.

3abitonOP1y ago

bamboozled1y ago

There is honestly no rhyme of reason to all these opinions, someone was telling me the other day that Claude is for sure the best, I'd say multiple people actually.

I find it concerning there is no real accurate benchmarks for this stuff that we can all agree on.

j / k navigate · click thread line to collapse