Indeed, we've done 100Gbps about a year ago on a 8-core machine.
Usually the only reason to use many cores is SSL, but since OpenSSL 3.0 that totally collapses under load, even then you're forced to significantly lower the number of threads (or to downgrade to 1.1.1 that about any high traffic site does).
Horrible locking. 95% CPU spent in spinlocks. We're still doing measurements that we'll report with all data shortly. Anyway many of them were already collected by the project; there are so many that they created a meta-issue to link to them: https://github.com/openssl/openssl/issues/17627#issuecomment...
3.1-dev is slightly less worse but still far behind 1.1.1. They made it too dynamic, and certain symbols that were constants or macroes have become functions running over lists under a lock. We noticed the worst degradation in client mode where the performance was divided by 200 for 48 threads, making it literally unusable.