Anything that needs to be data should be data, not images. Except for some very specific cases, you're not doing anybody any favors by outputting PDF. That format is a data black hole. It allows you to transmit very well-formatted output, but it absolutely stops you from reliably using anything in that content.
I beg you all: if it's anything that contains data, or really, if it's anything for which layout and formatting is not absolutely critical, please don't use PDF. Send data as data.
Every few months here, we get a customer asking why we can't automatically handle purchase orders that they send us in PDF format, and every time they get the same explanation.
The only thing I can't do easily is edit the documents, but I don't need to do that with scanned documents (e.g. tax statements).
Also, tesseract often needs to be configured.
You can both upload and specify format when downloading. I guess that includes the OCR applied if necessary.
I may not rely entirely on google docs for my OCR needs in future ;)
I find it hard to believe Google would release this if it was that useless (no jokes about Buzz or Wave, please).
Pretty cool.
I wonder what are they using for Google Goggles and this