you used the word "identical" to describe it, not me
words matter
which is why I still think this is a terrible idea, I don't think it holds up in the general case and would, as a peer reviewer, be inclined to believe there is benchmark filtering that makes for good results.
You should use the same benchmarks everyone else is when you write your paper