Can't afford buying all that + shipping to Europe, but would like to play with the Data for my NLP Project.
Remember the days when people used to make links on the web because they weren't greedy with their pagerank?
At least Google left us some machine learning data sets after they took all the links. You just can't find them because nobody links to them.
http://blog.bigml.com/2013/02/28/data-data-data-thousands-of...
http://tm.durusau.net/?p=39312
http://dvn.iq.harvard.edu/dvn/
_____________
this subreddit seems like a decent place to ask questions
http://commondatastorage.googleapis.com/books/icdar2007/READ...
http://blog.bigml.com/2013/02/28/data-data-data-thousands-of...
https://explore.data.gov/catalog/raw/
http://www.data.gov/opendatasites
http://glasspockets.org/work/reportingcommitment/api.html
http://thedata.harvard.edu/dvn/
http://www.reddit.com/r/datasets
When playing with new programming languages instead of a 'todo' list I always end up building an XKCD password generator. Interestingly enough, I've never found a frequency/comprehension list worth using to populate it for public consumption.