> Bash(DATABASE_URL=$(grep -E '^DATABASE_URL=' .env 2>/dev/null | head -1) echo "ok")
even though I have in CLAUDE.md:
> For database queries, use tidewave first.
I then prompted:
> use tidewave as per CLAUDE.md. also diagnose why you failed to heed that
> ● Diagnosis first: I defaulted to shell habits (env grep → psql) instead of pausing to recall the CLAUDE.md rule that tidewave is the first-line DB tool. The trigger was "look at this record" — I should have read that as "run a SQL query" and reached for tidewave immediately.
If Opus 4.7 doesn't follow simple CLAUDE.md instructions, I'm not sure what benefits other markdown files could bring. I don't trust Opus's own explanation, but it could point to the fact that the system prompt for bash is much longer than CLAUDE.md with tidewave.
While LLM judging could be helpful, I think the tool-call assertions (https://github.com/darkrishabh/agent-skills-eval#what-you-ge...) may be the most useful thing in agent-skills-eval given that it's the only objective measure of compliance.