Go to https://hn-archive.appspot.com/ for the torrent file / source code.
I'll be semi-frequently checking the story and answering any questions which may come up.
what date range does this correspond to? How big is the archive?
The archive is 1.12GB big and contains 1 JSON document / per line. The JSON document is approximately the format returned by the official HN API (although there are some exceptions since some of the comments are not available through the official API and those had to be retrieved through the Algolia API and/or scraped from the site).
If anyone else is interested here is the (terrible) code to get it into a prototype format. https://gist.github.com/binarymax/d3691180e65ff7f0dec5
As a side note, not really having looked too deeply into word2vec, does word2vec capture multiple meanings? If so, how?
Starting training using file 10m.txt
Vocab size: 305432
Words in train file: 565170189
Alpha: 0.000045 Progress: 99.91% Words/thread/sec: 107.57k
real 174m19.955s
user 1315m35.661s
sys 3m27.011s
Enter word or sentence (EXIT to break): startup Word: startup Position in vocabulary: 390
Word Cosine distance
------------------------------------------------------------------------
startups 0.808231
bootstrapped 0.719379
entrepreneur 0.707722
starup 0.698379
bootstrapping 0.698216
incubator 0.683647
founders 0.664983
scrappy 0.660502
entrepreneurs 0.660176
entrepreneurial 0.656120
yc 0.652160
cofounder 0.651848
vc 0.650642
fledgling 0.636813
cofounders 0.632761
venture 0.622636
company 0.617562
incubators 0.612947
statup 0.608451
founder 0.608080
entrepreneurship 0.604812
sv 0.603689
bigco 0.602171
startuppers 0.592669
cofounded 0.588964
entrepeneurs 0.585747
solo 0.582533
entreprenuers 0.564045
boostrapped 0.562884
solopreneurs 0.559994
cofounding 0.559840
statups 0.558347
business 0.552922
bootstrapper 0.551885
techstars 0.545766
bootstrappers 0.545263
fintech 0.545090
fundable 0.542542
shotput 0.541257
accelerator 0.540787Clickable
Also the file is ~5.3GB when decompressed, if anyone is wondering.
Some day!