The Big Tech bots provide proven value to most sites. They have also through the years proven themselves to respect robots.txt, including crawl speed directives.
If you manage a site with millions of pages, and over the course of a couple years you see tens of new crawlers start to request at the same volume as Google, and some of them crawl at a rate high enough (and without any ramp-up period) to degrade services and wake up your on-call engineers, and you can't identify a benefit to you from the crawlers, what are you going to do? Are you going to pay a lot more to stop scaling down your cluster during off-peak traffic, or are you going to start blocking bots?
Cloudflare happens to be the largest provider of anti-DDoS and bot protection services, but if it wasn't them, it'd be someone else. I miss the open web, but I understand why site operators don't want to waste bandwidth and compute on high-volume bots that do not present a good value proposition to them.
Yes this does make it much harder for non-incumbents, and I don't know what to do about that.