Where does a word begin and end? (there's no spaces) How do I pronounce the word? (phonetics are missing) How do I look up the word in the dictionary? (it's necessary to know how to type it, and how to deconjugate verbs)
We can overcome these gaps with good software.
MeCab (compiled to WebAssembly) provides morphological analysis (guesses where words start and end, and what kind of word it is) Dictionaries are embedded, for client-side searching. As a result: there is no backend. The application is a Progressive Web Application, so it can be saved for offline use (141MB).
https://birchlabs.co.uk/mecab-web/ (Warning: 37MB webpage)
Technical notes:
There's a serious amount of dictionary included. I culled Kanjidic from 15.5MB to 0.7MB. Remaining dictionaries gzip pretty well (138MB -> 36MB).
Apache is configured for streaming compilation and pre-computes gzips.
I wanted to explore whether we actually _need_ a bundler in 2019. I used @pika/web to grab libraries as ES modules.
HTTP/2 + gzip used instad of bundler. Source _is_ distribution; old school. No backend, so application can be served statically from a CDN.
Preact/htm/unistore are used instead of React/JSX/Redux. Libraries weigh <100KB.
Workbox is used to generate a service-worker. Saves source code and assets so that the webpage can be saved as a PWA and used offline. Offline dictionaries have been done before (e.g. apps), but this is a particularly small one, and perhaps the first to provide sentence tokenization via MeCab.
I'd love to hear your feedback, be it on language concerns, technology, or user experience.