Web developers go out of their way to break Reader Mode. What incentive would they have to create content for a platform that competes with the web, which actively limits their ability to deliver ads and degrade the reading experience?
In any case, I imagine that a ML powered Reader Mode engine would perform significantly better than a semantically powered engine. It should be fairly straightforward to train a ML model to make a visual distinction between the actual content, and the ads and other garbage that pollutes webpages. Crowdsourced training data would increase the accuracy even further, while limiting the ability of developers to defeat the reader mode.