circa is a good idea, actually. From my close experience with this field, when a news article will be published it will be edited and republished many times, over many forms and shapes (Web, RSS, etc.) in many of these steps, a manual, human work is needed -- and this affects the volumes of the published news.
Further, many of the news really originate from relatively limited sources (reuters, etc), so you can plug your solution there as well.
Therefore it should be OK to assume that if you put humans at the same pipeline to summarize news manually, the capacity and efficiency will be reasonable.