Also, it's an "API" (looks more like a url poiting to a CGI program to me, but whatever). API's are "cool" and "fun", while running local programs that you have control over is old and boring and not the future of computing.
About us using your data, our privacy policy will clarify that.
:)
I love the happy little Unix-style legos of productivity.
This follows the API documented by Stampin (minus the throttling errors)--it does not currently do the OCR, but as mentioned elsewhere by zdw you can probably get tesseract to get you like 80% of the way there. If you wanted to use that, you'd likely just replace the hacky `pdftotext` callout with your preferred toolchain.
You'll need Ruby, Sinatra, and the Xpdf tools, I believe.
Dual-licensed under the AGPL, BSD, and WTFPL licenses. idklol.
The code:
require 'sinatra'
require 'json'
use Rack::Logger
post '/extracttext' do
begin
status 204 and return unless params["file"] != nil
type = params["type"] || "text"
lang = params["lang"] || "en"
tmpfilename = params["file"][:tempfile].path
`pdftotext #{tmpfilename}`
File.delete(tmpfilename)
convfile = File.open("#{tmpfilename}.txt","r")
lines = convfile.read.split("\n")
convfile.close
File.delete(convfile.path)
content_type "application/json"
{"text"=>lines}.to_json
rescue
status 500 and return
end
end
EDIT:For God's sake run this in a jail and only on an internal network!
Here is the PDF I used to test: https://www.gov.uk/government/uploads/system/uploads/attachm...
Is there a technical reason for the 1-2MB limit or is it arbitrary?
That's something we can provide pretty easily and we would try to provide that in our next release. If you want us to help you with your specific problem, please send us an email at info@stamplin.com.
The limit has been set to prevent our server from crashing as we do not have, for the moment, the financial capability to support a massive server farm. Again, if this limit prevent you from using our API, we might move the limit up if you ask it by email.
/\n\nnmrs wn\ufb02qyi mm mm\nTlIIEI\ufb02|\ufb02llllM\u2018l coI'm excited to try this.. so figure out a way to take my money soon.
Any chance you could release the code as Free or open source so that its users can use it standalone on their own machines?
1) Mobiles have pretty good CPUs. I think uploading and waiting for response would be slower and less reliable.
2) If the mobile user doesn't have an internet connection, the app won't work.
3) As a developer, I would be dependant on an external service, that could stop working someday.