Scaling up is one of those easier said than done things. Until Apple actually does it there's no reason to assume they could do it.
Cavium is sort of your proof that scaling is hard. The 32-core ThunderX2 @ 2.2ghz with 56 PCI-E lanes has the same TDP as the 32-core AMD Epyc with 128 PCI-E lanes. And it's slower than the competition from AMD & Intel at comparable power budgets.
Sure, Cavium is not yet competitive for compute heavy workloads that make use of vectorization. For everything else, e.g. memory bandwidth bounded algos from what I gathered in benchmarks they are quite competitive. And those kind of workloads are actually quite common in HPC from my POV.