Show HN: Crawlee – Web scraping and browser automation library for Node.js (opens in new tab)

(crawlee.dev)

282 pointsjancurn3y ago80 comments

Hey HN,

This is Jan, founder of Apify, a web scraping and automation platform. Drawing on our team's years of experience, today we're launching Crawlee [1], the web scraping and browser automation library for Node.js that's designed for the fastest development and maximum reliability in production.

For details, see the short video [2] or read the announcement blog post [3].

Main features:

- Supports headless browsers with Playwright or Puppeteer

- Supports raw HTTP crawling with Cheerio or JSDOM

- Automated parallelization and scaling of crawlers for best performance

- Avoids blocking using smart sessions, proxies, and browser fingerprints

- Simple management and persistence of queues of URLs to crawl

- Written completely in TypeScript for type safety and code autocompletion

- Comprehensive documentation, code examples, and tutorials

- Actively maintained and developed by Apify—we use it ourselves!

- Lively community on Discord

To get started, visit https://crawlee.dev or run the following command: npx crawlee create my-crawler

If you have any questions or comments, our team will be happy to answer them here.

[1] https://crawlee.dev/

[2] https://www.youtube.com/watch?v=g1Ll9OlFwEQ

[3] https://blog.apify.com/announcing-crawlee-the-web-scraping-a...

Show HN: Crawlee – Web scraping and browser automation library for Node.js

(crawlee.dev)

282 pointsjancurn3y ago80 comments

Hey HN,

For details, see the short video [2] or read the announcement blog post [3].

Main features:

- Supports headless browsers with Playwright or Puppeteer

- Supports raw HTTP crawling with Cheerio or JSDOM

- Automated parallelization and scaling of crawlers for best performance

- Avoids blocking using smart sessions, proxies, and browser fingerprints

- Simple management and persistence of queues of URLs to crawl

- Written completely in TypeScript for type safety and code autocompletion

- Comprehensive documentation, code examples, and tutorials

- Actively maintained and developed by Apify—we use it ourselves!

- Lively community on Discord

To get started, visit https://crawlee.dev or run the following command: npx crawlee create my-crawler

If you have any questions or comments, our team will be happy to answer them here.

[1] https://crawlee.dev/

[2] https://www.youtube.com/watch?v=g1Ll9OlFwEQ

[3] https://blog.apify.com/announcing-crawlee-the-web-scraping-a...

80 comments

72 comments · 25 top-level

franga20003y ago· 6 in thread

Looks like you took the good ideas from Scrapy's crawling engine and combined it with a great scraping API, which is all I ever wanted in a bot framework!

I'm especially excited about the unified API for browser and HTML scraping, which is something I've had to hack on top of Scrapy in the past and it really wasn't a good experience. That, along with puppeteer-heap-snapshot, will make the common case of "we need this to run NOW, you can rewrite it later" so much easier to handle.

While I'm not particularly happy to see JavaScript begin taking over another field as it truly is an awful language, more choice is always better and this project looks valuable enough to make dealing with JS a worthwhile tradeoff.