undefined | Better HN

0 pointsspideymans4y ago0 comments

>But of course, there is no way to create a reader mode that works for all web pages -- and that fact is one of the main arguments for competing with the web rather than trying to improve or fix the web.

Web developers go out of their way to break Reader Mode. What incentive would they have to create content for a platform that competes with the web, which actively limits their ability to deliver ads and degrade the reading experience?

In any case, I imagine that a ML powered Reader Mode engine would perform significantly better than a semantically powered engine. It should be fairly straightforward to train a ML model to make a visual distinction between the actual content, and the ads and other garbage that pollutes webpages. Crowdsourced training data would increase the accuracy even further, while limiting the ability of developers to defeat the reader mode.

0 comments

1 comments · 1 top-level

hollerith4y ago

>Web developers go out of their way to break Reader Mode. What incentive would they have to create content for a platform that competes with the web, which actively limits their ability . . . ?

It is unrealistic to expect most publishers of web pages -- or even most publishers of mostly-static text-heavy web pages -- to put their content on a competitor. It is still worthwhile IMO to try to steal from the web 1% or 2% of the web's user-hours ("person-hours"?) and that can probably be done (over say 5 years) with a small fraction of the web's static textual content (plus a little content that is not available on the web).

My guess is that the largest beneficial effect (on the world) of doing that will be to convince more people that the web is a mess (by showing them a less messy alternative even though they will continue to spend most of their online hours on the web).

Technically inclined people and people who used the web (or maybe Usenet) in the 1990s either already know that or cannot be persuaded of it regardless of what we do, but perhaps the most important effect of stealing 1% of the web's user-hours is convincing more people outside of those 2 groups of it.

P.S. I have already noticed the potential for machine learning to help users of the web: https://news.ycombinator.com/item?id=27856814