undefined | Better HN

0 pointsprodigycorp1mo ago0 comments

the models were objectively horrible

0 comments

They really weren't horrible. They were ~gpt4o, with the added benefit that you could run them on premise. Just "regular" models, non "thinking". Inefficient architecture (number of active out of total) but otherwise "decent" models. They got trashed online by bots and chinese shills (I was online that weekend when it happened, it's something to behold). Just because they were non-thinking when thinking was clearly the future doesn't make them horrible. Not SotA by any means, but still.

nl1mo ago

> They were ~gpt4o, with the added benefit that you could run them on premise.

No, they are bad models. They were benchmaxxed on LMAreana and a few other benchmarks but as soon as you try them yourself they fall to pieces.

I have my own agentic benchmark[1] I use to compare models.

Llama-4-scout-17b-16e scores 14/25, while llama-4-maverick-17b-128e scores 12/25.

By comparison gemma-4-E4B-it-GGUF:Q4_K_M scores 15/25 (that is a 4B parameter model!) - even GPT3.5 scores 13/25 (with some adjustment because it doesn't do tool calling).

Llama 4 was a bad model, unfortunately.

[1] https://sql-benchmark.nicklothian.com/#all-data

ac291mo ago

> By comparison gemma-4-E4B-it-GGUF:Q4_K_M scores 15/25 (that is a 4B parameter model!)

Gemma 4 E4B is slightly confusingly named, its a 8B param model

nl1mo ago

You are completely right on both counts.

It is a 8B model, and it is confusingly named. In fact I made exactly the same point[1] when it was released and promptly forgot!

[1] https://news.ycombinator.com/item?id=47622694

refulgentis1mo ago

Wrote longer comment steel-manning this, posted it to a reply, then realized you might like to know they had a reasoning model on deck ready for release in the next 2-4 weeks.

Got shitcanned due to bad PR & Zuck God-King terraforming the org, so there'd be a year delay to next release.

Real tragi-comedy, and you have no idea how happy it makes me to see someone in the wild saying this. It sounds so bizarre to people given the conventional wisdom, but, it's what happened.

prodigycorpOP1mo ago

Nah I remember how disgusted I felt trying llama 4 maverick and scout. They were both DOA.. couldn't even beat much smaller local models.

refulgentis1mo ago

I'll cosign what you said, simultaneously, yr interlocutor's point is also well-founded and it depresses me it's not better known and sounds so...off...due to conventional wisdom x God King Zuck's misunderstanding his own company and resulting overreaction.

They beat Gemini 2.5 Flash and Pro handily on my benchmark suite. (tl;dr: tool calling and agentic coding).

Llama 4 on Groq was ~GPT 4.1 on the benchmark at ~50% the cost.

They shouldn't have released it on a Saturday.

They should have spent a month with it in private prerelease, working with providers.[1]

The rushed launch and ensuing quality issues got rolled into the hypebeast narrative of "DeepSeek will take over the world"

I bet it was super fucking annoying to talk to due to LMArena maxxing.

[1] my understanding is longest heads up was single-digit days, if any. Most modellers have arrived at 2+ weeks now, there's a lot between spitting out logits and parsing and delivering a response.

alex11381mo ago

Your comments seem to imply the engineers made a great product but Zuck intervened so now it's shit

1 more reply

pixel_popping1mo ago

failing non-stop at tool calls on top of that.

owebmaster1mo ago

Thanks for calling me a bot. Llama4 and meta ai sucks

j / k navigate · click thread line to collapse

0 comments

NitpickLawyer1mo ago

nl1mo ago

> They were ~gpt4o, with the added benefit that you could run them on premise.

No, they are bad models. They were benchmaxxed on LMAreana and a few other benchmarks but as soon as you try them yourself they fall to pieces.

I have my own agentic benchmark[1] I use to compare models.

Llama-4-scout-17b-16e scores 14/25, while llama-4-maverick-17b-128e scores 12/25.

By comparison gemma-4-E4B-it-GGUF:Q4_K_M scores 15/25 (that is a 4B parameter model!) - even GPT3.5 scores 13/25 (with some adjustment because it doesn't do tool calling).

Llama 4 was a bad model, unfortunately.

[1] https://sql-benchmark.nicklothian.com/#all-data

ac291mo ago

> By comparison gemma-4-E4B-it-GGUF:Q4_K_M scores 15/25 (that is a 4B parameter model!)

Gemma 4 E4B is slightly confusingly named, its a 8B param model

nl1mo ago

You are completely right on both counts.

It is a 8B model, and it is confusingly named. In fact I made exactly the same point[1] when it was released and promptly forgot!

[1] https://news.ycombinator.com/item?id=47622694

refulgentis1mo ago

Wrote longer comment steel-manning this, posted it to a reply, then realized you might like to know they had a reasoning model on deck ready for release in the next 2-4 weeks.

Got shitcanned due to bad PR & Zuck God-King terraforming the org, so there'd be a year delay to next release.

Real tragi-comedy, and you have no idea how happy it makes me to see someone in the wild saying this. It sounds so bizarre to people given the conventional wisdom, but, it's what happened.

prodigycorpOP1mo ago

Nah I remember how disgusted I felt trying llama 4 maverick and scout. They were both DOA.. couldn't even beat much smaller local models.

refulgentis1mo ago

They beat Gemini 2.5 Flash and Pro handily on my benchmark suite. (tl;dr: tool calling and agentic coding).

Llama 4 on Groq was ~GPT 4.1 on the benchmark at ~50% the cost.

They shouldn't have released it on a Saturday.

They should have spent a month with it in private prerelease, working with providers.[1]

The rushed launch and ensuing quality issues got rolled into the hypebeast narrative of "DeepSeek will take over the world"

I bet it was super fucking annoying to talk to due to LMArena maxxing.

[1] my understanding is longest heads up was single-digit days, if any. Most modellers have arrived at 2+ weeks now, there's a lot between spitting out logits and parsing and delivering a response.

alex11381mo ago

Your comments seem to imply the engineers made a great product but Zuck intervened so now it's shit

1 more reply

pixel_popping1mo ago

failing non-stop at tool calls on top of that.

owebmaster1mo ago

Thanks for calling me a bot. Llama4 and meta ai sucks

j / k navigate · click thread line to collapse