Reading norvig library in HN or google.groups, also HN, you notice that data is not easy available. I should like to have a text file description of what i see in a web page, but I think that they prefer to keep the data not easily available. So wait another century for the semantic web or some big enterprise to get all the data, that is the winner takes it all.