What's a page that's made of JSON? And isn't it less semantic and more dependent on JS to convert the page into a readable representation, than a pre-rendered static site?
These YT pages rely on automation. The browser runs Javascript to format the page and to make HTTP requests that send data back to Google (privacy violation, no user benefit). The browser loads thumbnail images, automatically. There are many steps that have been automated. The JS is of course not written by the user, but by Google to support its data mining and advertising business.
However, it is also possible to retrieve a web page and perform the necessary steps manually, without Google's "help". Instead of letting a browser do whatever the website's Javascript programmers want it to do (to suit advertising interests, not user interests), the user controls the process, performing the steps manually.
This is how I approach YouTube and other convoluted websites. I retrieve the page to memory (tmpfs), using a relatively simple TCP client + local TLS proxy (no gigantic web browser is needed for such a simple task). I do not retrieve the separate Google Javascript files (which a Js-enabled browser will automatically request. There is no need for them; they are used to manipulate the user for Google's interests. (I am not interested in commercial videos nor am I interested in Google's JS "video player"; I do not use a mouse.) As the page is mostly JSON, it is not formatted to be easily readable on the screen. I reformat it manually, using tr and sed. Then I extract the bits I want from the text, i.e., playback URLs, and various metadata such video IDs, descriptions, durations, suggestions, views, likes, channels (if any), time since upload, thumbnail URLs, continuation token, etc. Then I make a subsequent HTTP request if I want something further.
By contrast, using a "modern" Javascript-enabled browser controlled by an advertising-funded organisation to retrieve a page from YouTube will result in all manner of privacy intrusion. Even just leaving a page open in the browser, without interacting with it at all, the Google JS will trigger constant HTTP requests, some empty (zero benefit to the user, the user would never intentionally make such requests).
The amount of code needed for the fully automated Javascript-enabled browser is gigantic. The program is a security nightmare. The amount of code need for youtube-dl is also relatively large; IME the distributed binary can take over 7 seconds to start up. The amount of code I need is, by comparison, tiny. I only need sed, tr and a TCP client. Everything fits on a single page. Fast and reliable.
This is overblown.
Serious question ... how do you propose these companies that provide these services make money?
It's not free to buy yottabytes of space to archive every video we're uploading. Somebody has to send a paycheck to all the employees who make this happen.
If the response is "pay for it", the entire business model collapses because the vast majority won't pay for it.
And I'm no apologist here. I'm running uBlock Origin in Brave and similar things for a reason.
How about micropayments per article / video / whatever? e.g. using bitcoin lightning network.
I'd choose that over ads without doubt.
Is it really? I don't doubt it's fast, but in my experience it's very hard, if not impossible, to scrape these "mostly JSON" services without the whole thing failing spectacularly as soon as the site shuffles their data model a bit.
Maybe YouTube is more amenable to this than the services I'm thinking of -- I personally have no qualms deferring to youtube-dl so I wouldn't know -- but for things like my university's lecture recordings my efforts seem to consistently be mooted within a matter of months or weeks.