undefined | Better HN

0 pointsnolite15y ago0 comments

Is it possible to give us a rough overview of your tech infrastructure? (Servers, processes, storage)?

0 comments

3 comments · 1 top-level

jbr15y ago· 2 in thread

Sure. This probably will turn into a blog entry, but here's the gist:

  4 medium sized linodes, divided as follows:
  1 app:    unicorn (rails), nginx
  1 db:     redis, solr, mysql
  2 worker: resque workers (both ruby/rails node.js) & the crawler

The crawler is written in node.js, backed by redis. When it finds a new page, it downloads it to shared local storage and adds a task to a resque queue monitored by the rails workers. They add a row to a mysql table that represents the permanent record of the page, use nokogiri to extract the body content and any metadata, index it into solr, delete the local copy, and upload the page to an s3 archive. When you request the page, rails asks solr.

noliteOP15y ago

Nice, thanks..any stats on how fast it is? It looks pretty fast from the web page update

jbr15y ago

Haven't built in much monitoring yet, but watching the resque web interface, most of the delay is in actually finding the new page - from there to it showing up on your screen is no more than a second or two. For subscribed users, we do email notifications and we almost always beat google news alerts, often by around 15 minutes.

1 more reply

j / k navigate · click thread line to collapse