I usually start a crawl with a list of the top one million websites according to Alexa.com. Unfortunately that list doesn't seem to be available for download anymore, so I have to make due with the latest one I have which is from April 2014. After that URLs get crawled basically in a first-come first-serve order. I simply crawl them in the order I find them.
Interpreting ambiguous words for ranking is beyond what I can do at the moment.