Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
warwickmcintosh
2mo ago
0 comments
Save
Share
The sanitised optimism problem mentioned upthread is the real gap here. Event stream logging tells you what tools were called and in what order, but it doesn't tell you whether the agent's self-reported outcome matches reality.
0 comments
No comments yet.