Hi HN, I'm Marcus Lager. I'd like to present Crawl Crawler, a search engine powered by OSS and common data where there are some extra steps involved before you get excellent search results.
Crawl Crawler is as much a JSON HTTP API for private and corporate data hungry text projects as it is plain ol' non-tracking, keyword-based web search, results marked up with ad-free non-dynamic HTML, served cookie-free.
Crawl Crawler gives you the ability to search four grand sources of data, plus your own: the Common Crawl meta-data, text, and HTML repositories as well as WWW itself.
Use Crawl Crawler to
- find what data you need from Common Crawl/WWW and for whatever reason, in JSON or HTML, using any device.
- create and periodically refresh your indices, perhaps from your favorite parts of the web.
- periodically crawl your site and replace your in-site search engine with HTTP requests towards Crawl Crawler.
- execute natural language or structured queries.
- build apps.
- take part in the enrichment process. The more you enrich Crawl Crawler the wider, deeper and more current its indices become, in real time.
Ask me anything.