What kind of data would satisfy you? I imagine any data coming directly from YC would be untrustworthy and third-party data would be incomplete (say, it wouldn't catch content removed before it's published).
Is there a similar data set for other private platforms?