Yes performance involves experiments, but are they scientific experiments? I'm not sure it matters either way, it's a purely semantic debate. The issues that create replicability problems in CS are pretty different to the ones that create issues in other fields. My experience was that they're purely engineering problems rather than problems of incentives or non-replicable designs. If the issue was just that some papers occasionally don't replicate because the authors forgot a detail, they get queried and update the paper then nobody would care about this issue. It gets attention because that's sadly not what happens.