undefined | Better HN

0 pointspolygamous_bat2y ago0 comments

I assume these landing pages are made for wall st analysts rather than people who understand LLM eval methods.

0 comments

True, but even some of the apples to apples is favorable to Gemini Ultra 90.04% CoT@32 vs. GPT-4 87.29% CoT@32 (via API).

dongobread2y ago

This isn't apples to apples - they're taking the optimal prompting technique for their own model, then using that technique for both models. They should be comparing it against the optimal prompting technique for GPT-4.

rockinghigh2y ago

Showing dominance in AI is also targeted at their entreprise customers who spend millions on Google Cloud services.

j / k navigate · click thread line to collapse

0 comments

bryanh2y ago

True, but even some of the apples to apples is favorable to Gemini Ultra 90.04% CoT@32 vs. GPT-4 87.29% CoT@32 (via API).

dongobread2y ago

rockinghigh2y ago

Showing dominance in AI is also targeted at their entreprise customers who spend millions on Google Cloud services.

j / k navigate · click thread line to collapse