I implemented a toy Cypher database (samsquire/hash-db) and I just use a python test script. I am yet to benchmark, the performance is probably poor.
I tried running standardised SQL benchmarks against MySQL but the benchmark code fell behind the MySQL client and it's work to maintain it.
I inherited a Jepsen suite to test ActiveMQ and it wasn't easy to understand
Testing can be a full time job!