1. That’s plainly not the case here since other languages are allowed to use custom allocators
2. Why use a binary tree benchmark in the first place if you’re going to limit the implementation to certain naive implementations (and again, only for one language)? Why not just measure allocations outright or at least call the benchmark “allocator performance”?
3. Showing allocation performance doesn’t help anyone understand the actual performance of the language, which is transparently what everyone uses these benchmarks for. If they wanted a general idea for language performance they would allow trivial, idiomatic optimizations. A benchmark that shows allocation performance is worthless, and a suite of benchmarks that includes a benchmark for allocation performance but not GC latency is worse than worthless: it’s misleading because latency is the more important concern and it’s what these bump allocators trade in order to get their fast allocation performance.