Show HN: WWW search powered by OSS and common data (opens in new tab)

(crawlcrawler.com)

2 pointscrawlcrawler6y ago1 comments

1 comments

1 comments · 1 top-level

Hi HN, I'm Marcus Lager. I'd like to present Crawl Crawler, a search engine powered by OSS and common data where there are some extra steps involved before you get excellent search results.

Crawl Crawler is as much a JSON HTTP API for private and corporate data hungry text projects as it is plain ol' non-tracking, keyword-based web search, results marked up with ad-free non-dynamic HTML, served cookie-free.

Crawl Crawler gives you the ability to search four grand sources of data, plus your own: the Common Crawl meta-data, text, and HTML repositories as well as WWW itself.

Use Crawl Crawler to

- find what data you need from Common Crawl/WWW and for whatever reason, in JSON or HTML, using any device.

- create and periodically refresh your indices, perhaps from your favorite parts of the web.

- periodically crawl your site and replace your in-site search engine with HTTP requests towards Crawl Crawler.

- execute natural language or structured queries.

- build apps.

- take part in the enrichment process. The more you enrich Crawl Crawler the wider, deeper and more current its indices become, in real time.

Ask me anything.

j / k navigate · click thread line to collapse