Hey everyone,
we built a special-purpose web scraping client for Node.js. When scraping with pure HTTP clients, you want to blend in with the regular traffic as much as you can. This means your request signature needs to look like a browser's.
With got-scraping, we developed a special purpose header generator(https://github.com/apify/header-generator) that uses a bayesian network and real browser headers to make your headers undistinguishable.
We also override Node.js ciphers with the browser ones and simplify the use of proxies. HTTP protocol versions are auto-detected for both the target website and the proxy, so you can have a perfect HTTP2 connection even through a HTTP(S) proxy.
It's always a work in progress, so we would be grateful for any comments or tips how to make the requests even more stealthy!
Thanks!