Analyzing Geekbench 6 under Intel's BOT (opens in new tab)

(geekbench.com)

41 pointshajile2mo ago15 comments

15 comments

14 comments · 7 top-level

boomanaiden1542mo ago· 5 in thread

Post link optimization (PLO) tools have been around for quite a while. In particular, Meta’s BOLT (fully upstream in LLVM) and Google’s Propeller (somewhat upstream in LLVM, but fully open source) have been around for 5+ years at this point.

It doesn’t seem like Intel’s BOT delivers more performance gains, and it is closed source.

tyushk2mo ago

Intel BOT seems to be patches for specific binaries (hence why they didn't see a difference for Geekbench 6.7), unlike BOLT/Propeller which are for arbitrary programs. The second image from their help page [1] showcases this.

[1] https://www.intel.com/content/www/us/en/support/articles/000...

userbinator2mo ago

Applying targeted binary patches shouldn't take 40 seconds... unless that's also a fake "so it looks like it's working really hard" delay.

boomanaiden1542mo ago

I might be thinking of a different project then...

I swore Intel had their own PLO tool, but I can only find https://github.com/clearlinux/distribution/issues/2996.

1 more reply

trynumber92mo ago

Question: do those vectorize code as in the example here? I was of the understanding they performed a more limited subset of optimizations.

boomanaiden1542mo ago

Propeller can’t really do many instruction level modifications due to how it works (constructs a layout file that then gets passed to the linker).

BOLT could do this, but does not as far as I’m aware.

Most of vectorization like this is also probably better done in a compiler middle end. At least in LLVM, the loop vectorizer and especially the SLP Vectorizer do a decent job of picking up most of the gains.

You might be able to pick up some gains by doing it post-link at the MC level, but writing an IR level SLP Vectorizer is already quite difficult.

tyushk2mo ago· 1 in thread

quack3.exe again in a way. If it's been done for years on GPU shaders, then why not CPU code?

consp2mo ago

While highly specific optimisations might give you a tiny bit of advantage, the main boost here is vector code which would work on any processor supporting the instructions. They could have looked at the vendor bits and use those to flag for optimization in any cpu but they didn't and limited it to a small subset of programs and cpus. It tingles the "PR above all else must have highest score" sense.

refulgentis2mo ago· 1 in thread

> BOT optimizations are poorly documented, aggressive in scope, and damage comparability with other CPUs. For example, BOT allows Intel processors to run vector instructions while other processors continue to run scalar instructions. This provides an unfair advantage to Intel

Wait until they hear about branch predictors.

1una2mo ago

The thing is BOT only applies to a handful of applications. So Geekbench scores with BOT applied aren't as representative.

userbinator2mo ago

This suggests the checksum is used to identify whether the binary is known to BOT, and thus whether BOT can optimize the binary.

I do wonder what this "optimize" step actually entails; does it just replace the binary with one that Intel themselves carefully decompiled and then hand-optimised? If it's a general "decompile-analyse-optimise-recompile" (perhaps something similar to what the https://en.wikipedia.org/wiki/Transmeta_Crusoe does), why restrict it?

aurareturn2mo ago

FYI, Geekbench 6 already optimizes for AVX512. Intel just optimizes it even more for them.

I'll take the side of Geekbench here. There is no reason for Intel to optimize a benchmark tool except to cheat. The goal of GB is to test how typical applications run, not the maximum performance possible under ideal scenarios.

fxtentacle2mo ago

To me, the whole thing sounds like cheating in benchmarks.

Intel built a tool that will only activate for a specific benchmark - but not for real-world software which accomplishes similar things - and then the tool will replace generic bytecode with a (most likely) handcrafted and optimized variant for running this specific benchmark on this specific CPU. That means BOT will only boost the benchmark score, but not help at all with the end-user workflows that the benchmark is trying to emulate. Thereby, Intel's BOT makes the benchmark score misleading, which is why Geekbench is flagging them.

whatever12mo ago

Can we also end user tune our cpus for specific tasks we do?

j / k navigate · click thread line to collapse

15 comments

14 comments · 7 top-level

boomanaiden1542mo ago· 5 in thread

It doesn’t seem like Intel’s BOT delivers more performance gains, and it is closed source.

tyushk2mo ago

[1] https://www.intel.com/content/www/us/en/support/articles/000...

userbinator2mo ago

Applying targeted binary patches shouldn't take 40 seconds... unless that's also a fake "so it looks like it's working really hard" delay.

boomanaiden1542mo ago

I might be thinking of a different project then...

I swore Intel had their own PLO tool, but I can only find https://github.com/clearlinux/distribution/issues/2996.

1 more reply

trynumber92mo ago

Question: do those vectorize code as in the example here? I was of the understanding they performed a more limited subset of optimizations.

boomanaiden1542mo ago

Propeller can’t really do many instruction level modifications due to how it works (constructs a layout file that then gets passed to the linker).

BOLT could do this, but does not as far as I’m aware.

You might be able to pick up some gains by doing it post-link at the MC level, but writing an IR level SLP Vectorizer is already quite difficult.

tyushk2mo ago· 1 in thread

quack3.exe again in a way. If it's been done for years on GPU shaders, then why not CPU code?

consp2mo ago

refulgentis2mo ago· 1 in thread

Wait until they hear about branch predictors.

1una2mo ago

The thing is BOT only applies to a handful of applications. So Geekbench scores with BOT applied aren't as representative.

userbinator2mo ago

This suggests the checksum is used to identify whether the binary is known to BOT, and thus whether BOT can optimize the binary.

aurareturn2mo ago

FYI, Geekbench 6 already optimizes for AVX512. Intel just optimizes it even more for them.

fxtentacle2mo ago

To me, the whole thing sounds like cheating in benchmarks.

whatever12mo ago

Can we also end user tune our cpus for specific tasks we do?

j / k navigate · click thread line to collapse