How about taking a small, real open source project that has an AGENTS.md and showing how to add evals and optimize it?
The question I have is: what are we optimizing for and how do we measure it?
In your own repos, I see you have a fork of safepass, which seems like a nice simple project, but it doesn't have an agents file yet.