Snapshot of the results (sorry for busted format, ask your llm for dataviz. cant seem to format a good table in the comments)
Opus 4.7 on GraphQL-go-tools:
Low: 23/29 pass, 10/29 equivalent, 5/29 review-pass, custom avg 2.598, $2.50/task, 384s/task
Medium: 28/29 pass, 14/29 equivalent, 10/29 review-pass, custom avg 2.759, $3.15/task, 451s/task
High: 26/29 pass, 12/29 equivalent, 7/29 review-pass, custom avg 2.670, $5.01/task, 716s/task
Xhigh: 25/29 pass, 11/29 equivalent, 4/29 review-pass, custom avg 2.669, $6.51/task, 804s/task
Max: 27/29 pass, 13/29 equivalent, 8/29 review-pass, custom avg 2.690, $8.84/task, 997s/task
(custom avg is a set of rubrics used for llm-as-a-judge, graded out of 4)
Practically, the results indicate that medium has better outcomes, or at least the same outcomes, considering variance, as higher reasoning efforts, at a much lower cost/time.