undefined | Better HN

0 pointsaliljet7mo ago0 comments

The eval bar I want to see here is simple: over a complex objective (e.g., deploy to prod using a git workflow), how many tasks can GPT-5 stay on track with before it falls off the train. Context is king and it's the most obvious and glaring problem with current models.

0 comments

CamelCaseName7mo ago

This sounds like the kind of thing:

1. I desperately want (especially from Google)

2. Is impossible, because it will be super gamed, to the detriment of actually building flexible flows.

j / k navigate · click thread line to collapse

0 pointsaliljet7mo ago0 comments

0 comments

CamelCaseName7mo ago

This sounds like the kind of thing:

1. I desperately want (especially from Google)

2. Is impossible, because it will be super gamed, to the detriment of actually building flexible flows.

j / k navigate · click thread line to collapse