Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
warwickmcintosh
1mo ago
0 comments
Share
The sanitised optimism problem mentioned upthread is the real gap here. Event stream logging tells you what tools were called and in what order, but it doesn't tell you whether the agent's self-reported outcome matches reality.
0 comments
No comments yet.