Right, I will definitely work on adding data policy to terms of service. I think it will go along the lines, that you cannot remove the attributions/trademarks, if you are republishing the data, but doing analytics on it is ok.
It's a pretty fine line you're treading here. A lot of the reason the competition is so expensive is that they are licensing the data in bulk. There's lots of litigation in this area, see the Meltwater cases in US and UK.
If you have say 10k news sources from tens of countries around the globe, I doubt that it would be feasible to contract with every specific site out there, especially if you get data say from Google News.