Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments (opens in new tab)

(github.com)

6 pointsneversupervised25d ago2 comments

I want to share a new dataset of 331 reward-hackable environments. These are real environments used in Terminal Bench and adjacent benchmarks. I first got interested in this because, as a reviewer of Terminal Bench, I noticed a lot of our tasks were hackable. I also noticed that many contributors to the benchmark do so because it provides credibility when selling environments to labs. Hence, TBench tasks are, in my opinion, held to a higher quality standard than those being used today for RL. No one is spending hours manually reviewing the $1B in tasks being purchased by major labs. As far as I understand, while everyone knows environments are hackable, nobody has released hundreds of "realistic" environments.

2 comments

kxzh25d ago

how is it different from the berkeley 100% hack? https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/

neversupervisedOP25d ago

That paper focuses on breaking the harness, the same hack applies to all tasks. Here we are breaking tasks individually. If these were put on a different, more secure harness, most of the exploits would still work.

j / k navigate · click thread line to collapse

2 comments

kxzh25d ago

how is it different from the berkeley 100% hack? https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/

neversupervisedOP25d ago

j / k navigate · click thread line to collapse