Sure. This probably will turn into a blog entry, but here's the gist:
4 medium sized linodes, divided as follows:
1 app: unicorn (rails), nginx
1 db: redis, solr, mysql
2 worker: resque workers (both ruby/rails node.js) & the crawler
The crawler is written in node.js, backed by redis. When it finds a new page, it downloads it to shared local storage and adds a task to a resque queue monitored by the rails workers. They add a row to a mysql table that represents the permanent record of the page, use nokogiri to extract the body content and any metadata, index it into solr, delete the local copy, and upload the page to an s3 archive. When you request the page, rails asks solr.