If you're running a service (say like DocumentCloud) you're way better off precomputing a full text index on ingest and providing a search API than shunting over substantial parts of your stored documents.
Definitely cool as a piece of gear, but not terribly practical from a client-side perspective i'd think.
For what it's worth, it looks like DocumentCloud uses Open Calais, which is a Thomson Reuters product - I used to work there in a different division, they have a bunch of interesting products in this space.
I notice your blog is filled with NLP related goodies. I've been meaning to screw around with Stanford NER lib, to see if i can train up some custom recognizers for particular document domains of any utility.
Perhaps for PDFs are proprietary or sensitive. A related use case is transformation and extraction. I used this same technique recently for a client to turn VB6-generated PDF reports into HTML tables for preview, and sending the actual data to a service endpoint as JSON.