The list of sponsoring orgs for each TLD is at: http://www.iana.org/domains/root/db
Anyways, if you want a copy of domains from the index file just send me a mail to the address in my profile.
Crawling is much faster if you don't have to "spider" all the links and check if you've already visited them or not.
With a big enough list, you can just iterate over those domains. (average number of links on a website, how often does javascript framework x vs y get used, how many sites have an HTML5 doctype yet, ...)
*Edit: The problem seems to be fixed now.
wget https://s3.amazonaws.com/aws-publicdatasets/common-crawl/crawl-001/2008/06/19/0/1213886083018_0.arc.gz