ETA: Since you are using Go and targeting a specific modern CPU, you may also get a measurable benefit from setting GOAMD64=v3, so the go compiler generates code using AVX2, BMI2, LZCNT, etc.
> Since you are using Go and targeting a specific modern CPU, you may also get a measurable benefit from setting GOAMD64=v3
That's actually a long pending open issue in our backlog :)