Try to use less cores. RAM bandwidth is real limiting factor there, so there always some sweet spot between CPU cores and RAM bandwidth for individual system.
For example, I use only 6 cores from 10 on my M1 Pro laptop.
Thanks, -t 32 (instead of -t 13 which is what comes as default) makes a big difference in CPU usage across all cores. Not quite 100% but all cores are above 50% with many at 100%. It speeds up just a tiny bit the eval t/s to 3.3 (from 2.9).