Actually even reading the article I'm not completely certain where that 1000x comes from, I suppose it's when compared to the naive approach.
> This is three orders of magnitude less expensive (36 terms for n=9 and d=2)
Three orders of magnitude is 1000. Peter Norvig's approach was described above with:
> still expensive at search time (114,324 terms for n=9, a=36, d=2)
So 114,324 / 36 = 3,175, so "three orders of magnitude", and I suppose he went conservative by saying "1000x" rather than "3000x".
Some power languages (Dutch, German) have an infinite number of words, as you can take 2 nouns (fe a and b) and concatenate them to form a new one (ab means something else than ba). Some languages also have inflection....
The downside to any preorder algorithm is that you must rerun the preorder generation code anytime the input changes (in this case a dictionary) and often you must allocate extra storage to hold the pre-processed input.
This is a really interesting algorithm but, as always, you have to know the pros and cons and apply it where if fits (i.e. Not somewhere that needs a compressed dictionary).
[1] http://www.codeproject.com/Articles/31669/Hierarchical-Tree-...
http://java.dzone.com/news/lucenes-fuzzyquery-100-times
Looks like some interesting research and dev going on in this space at the moment.
They don't mention the approach I'd take for a naive spellcheck:
Generate a (giant) trie from the dictionary. Then perform a DFS on the trie allowing only <the current closest word's number of edits> edits from the input word, keeping track of the edits done so far - try no changes at each node first, then a deletion, then a change, then an insertion. This works better with a smaller alphabet size.
An optimization: a DAWG is potentially even better space-wise than a hashmap (effectively it ends up compressing the input data), but takes a while to precompute.
BTW - if anyone's interested in the Go implementation let me know, I'll try to find it and post it somewhere.
Am I mistaken?