Thanks for the background. I'm working on a similar project but currently parsing news articles using a collection of specific rss feeds and calling Google's NLP API with the text. It sounds like AlchemyAPI seems be a better fit in this case.
How are you finding Neo4J is handling the scale of reading and writing all these stories? I've had a positive experience so far but I'm only in the few thousands range.