Also, not all types of company will provide API endpoints. It all depends on the type of site - for example, an online shop might not wish to provide easily accessible data on offered products and prices, to their competitors who may wish to undercut them. Why would an online shop do that?
I absolutely would pay for an API that provides that data. I'd be willing to pay 10x more than the cost of maintaining and running the scrapers.
But the sites being scraped have no interest in that.
Because right now, I sure wish that the bots - which comprise probably 2/3 of my traffic - are causing me huge headaches and I wish that the people doing it would tell me what the heck they want.
The scraping company WILL use the API/CSV file... they will probably also still charge their customer for scraping, so it's a win-win :D
You can think of it this way, the prices and product data are publicly visible already on the website, there are no real secrets, none of it is password protected.
You can be principled and insist on blocking bots and spend a lot of time and money on tools, people, and ultimately hosting because the bots will always win; or you can offer the data for free/minimal fee and serve it with almost zero cost and cache it so you can do that with a micro sized server.
You can always lie about some of the prices if you want, but you will just encourage bots again.
Ethics are nice, but let's be honest, very lacking. Sometimes it's better to be pragmatic.
There's the problem right there. The prices and product data are publicy visible - because there is a target audience of /humans/ for whom the site is designed and intended to be used by. The site is not there to cater for a competitor's scrapers.
I don't care how much people couch their unethical behaviour in "the data is publically available", the basic fact is most if not all websites exist for human eyeballs to look at them. They do not exist for arseholes to DOS them by inundating them with scrapers.
But overall, information is one of those goods that has intrinsic properties like no other. It can be copied, infinitely. And we haven't yet figured out the dynamics of how to reason about it, so it feels like we're pretending they're physical goods.
Edit. Side note. I'd go further and say that some of the data is even worse, it's "offered" with the real intention being to confuse the users into performing non-optimally in the market. Look at Amazon/Ebay/AliExpress/Google listings for evidence of that. Just Google - Google is a ML and scraping power house, and the best they can muster is to be spammed with fake websites and duplicate/confusing listings.
There's a whole ethical subthread here of websites trying to making the experience for those humans miserable, and taking away the agency necessary to protect oneself from that. A browser is a user agent. So is a screen reader. So is a script one writes to not deal with bullshit fluff, when all one wants is a simple table of products, features and prices.
Your argument is perfectly valid and applies to offline activities as well (what stops a competitor from walking through the aisles of a Walmart or Costco?), but this is a battle that can't be won, there are too many parasitic actors. It is human nature.
Because otherwise the HTML will become the API.
Valuable public data is going to be scraped - this is inevitable. Even paywalled or signup protected valuable data is going to be scraped.
Why not sell valuable data for reasonable price then.