> Unfortunately, the days of reliable non-JavaScript capable scraping are over.
Not really. In a lot of cases websites use JavaScript to call some API along with some on the fly generated token to prevent abuse.
As long as that token isn't captcha you can reverse engineer the site to do scraping without javascript and that is so much faster than browser based scraping.
I agree with this. This is what I see on a lot of sites I scrape. Reverse engineering the JS to figure out how the fuck the token was generated is a bitch though.
So then you use headless browsers to render the js and that is even hackier, but totally worth it to hit another full webpage request to get the token, so you can go back to plain requests.
I don't think token is only the thing it comes to play here. If the company wants they can use various other techniques like fingerprinting, tls fingerprinting and lot of thing.
Its just a cat and mouse game. After few year I think hardware attention etc will come to play which can mitigate bot issue somewhat.