However if you are interested in OCR from Go without C complicating building and cross-compiling, there aren't any other options.
Wazero is a Go WASM runtime that doesn't have any CGo dependencies. With Emscripten Tesseract has been compiled to WASM and ran within Wazero.
Gogosseract provides a simple API on top of this. This project has been an interesting delve into the world of WASM.
It can work, but it's not the fastest thing in the world.
I think languages that make working with C/C++ code much more seamless, e.g. as nice as working with Go code can be, is a better approach. Zig does this well and feels quite natural coming from Go. It can also be used to make CGO cross compilation 'just work' and alleviate many of those pains.
Go’s FFI support is alright, but I find using WASM/WASI more pleasant.
https://github.com/wasilibs/go-zstd
Mostly since I hadn't found `compress` supports zstd. Wazero performed reasonably well against the cgo library but was indeed much slower than this proper pure go port.
That seems like quite an undertaking. But at that point, It would make sense to cut out WASM entirely like https://datastation.multiprocess.io/blog/2022-05-12-sqlite-i...
https://github.com/ncruces/go-sqlite3
One of the problems of the modernc approach (IMO) is that they're not just transpiling CPU/compute stuff, but entirely OS/platform stuff.
Each Go file of theirs is a xxx_os_arch.go that starts with 100s of OS-#defines-as-consts, and goes on to transpile fully #ifdefed code.
It also implements antithetical (in Go) stuff like goroutine local storage, because libc pthreads can't live without it.
And all IO is via direct syscalls that will never play nice with the Go scheduler, because again, this is OS level stuff.
WASM defines a cross platform CPU and an ABI, and using that for compute and the bottom OS layer in Go you get (IMO) a nicer end result.
Given the hard task of generating decent code from WASM at load time (wazero's compiler is pretty naive, a better one is being developed, but it will take seconds to generate good code for anything non trivial like SQLite) I wouldn't mind having a solution that translated to Go, or Go ASM, at build time.
Since OCR is a somewhat slow process, how does the WASM approach compare to running libtesseract in a subprocess and use some IPC layer to talk to Go? It would require a separate C++ compiler, but not CGo.
> one of the largest Open Source OCR
Tangential, but are there others as large as Tesseract? It seems to pop up anywhere I look.
The one serious competition is PaddleOCR, which is faster on GPU, and also works better for Chinese and other non-Western scripts.
There are some newer ML-based projects like DocTR that have been catching up, at least for some use cases.
I imagine just calling the Tesseract CLI from Go would be simplest if that's all you wanted.
How much difference is there between Tesseract and the best proprietary solutions?
When looking at the “best” prop solution, there are a few worth mentioning:
- If you are looking for the best OCR to DOCX solution, ABBYY OCR SDK is the front runner. Their OCR engine is not AS accurate as others I’ll mention, but their output engine (I.e. taking data beyond just the character, like bold or underlined or font name) is probably the best in the market.
- Google Document AI/Cloud Vision is probably the best all-around OCR. The 2 flavors determine whether you want to handle scanned PDFs/images (DocAI) or generalized photos (Cloud Vision). I believe they also have some level of training capabilities via Vertex but I haven’t checked it out.
- IRIS OCR.. Meh
- AWS Textract and Azure Vision are worth mentioning as contenders, but just like Google Document AI, they’re cloud based and that may factor into your decision.
- I haven’t tried DocTR or Paddle OCR
The only feature missing right now is Bounding Box detection, which I plan to add in the future.
I think this method really shines in Go as not having CGo simplifies a lot of things, and as a decently performant JITed runtime exists in the form of wazero.