How so?
A caching proxy costs you almost nothing and will serve thousands of requests per second on ancient hardware. Actually there's never been a better time in the history of the Internet to have competing search engines since there's never been so much abundance of performance, bandwidth, and software available at historic low prices or for free.
There are so many other bots/scrapers out there that literally return zero that I don’t blame site owners for blocking all bots except googlebot.
Would it be nice if they also allowed altruist-bot or common-crawler-bot? Maybe, but that’s their call and a lot of them have made it on a rational basis.
* - or is perceived to return
I run a number of sites with decent traffic and the amount of spam/scam requests outnumbers crawling bots 1000 to 1.
I would guess that the number of sites allowing just Googlebot is 0.
I doubt this is happening outside of a few small hobbyist websites where crawler traffic looks significant relative to human traffic. Even among those, it’s so common to move to static hosting with essentially zero cost and/or sign up for free tiers of CDNs that it’s just not worth it outside of edge cases like trying to host public-facing Gitlab instances with large projects.
Even then, the ROI on setting up proper caching and rate limiting far outweighs the ROI on trying to play whack-a-mole with non-Google bots.
Even if someone did go to all the lengths to try to block the majority of bots, I have a really hard time believing they wouldn’t take the extra 10 minutes to look up the other major crawlers and put those on the allow list, too.
This whole argument about sites going to great lengths to block search indexers but then stopping just short of allowing a couple more of the well-known ones feels like mental gymnastics for a situation that doesn’t occur.
That's not it. They're going to great lengths to block all bot traffic because of abusive and generally incompetent actors chewing through their resources. I'll cite that anubis has made the front page of HN several times within the past couple months. It is far from the first or only solution in that space, merely one of many alternatives to the solutions provided by centralized services such as cloudflare.
So practically, there's very little value in allowing those. I usually don't bother blocking them, but if my content wasn't easy to cache, I probably would.
This isn't a "natural" monopoly, it's more like Internet Explorer 6.0 and everyone designing their sites to use ActiveX and IE-specific quirks.