Great explanation. But to be clear to those who don't follow:
SIMD is supported by Java out of the box but the optimizer might miss some opportunities. With this API it is far more likely that SIMD will be used if it's available and on first compilation so performance should be improved.
Lucene here just dealing with plain float[]s, so Valhalla at least shouldn't affect it much. It seems the limiting thing here is that it has sum accumulators, which the optimizer can't reorder because addition isn't associative.