I haven't exactly searched far and wide, but this one is pretty good.
This is a perfectly valid thing to do, but it is changing the power of regexes significantly. I feel the article makes this fact easy to overlook—I certainly did until I took a class on automata.
But yes, presumably Perl could use an NFA-based algorithm if given a regex that is actually regular. One of my professors attributed the fact that it doesn't to old patent issues, but I have not been able to find a good source for that. It's also possible that it's just a case of worse is better: backtracking is fast enough 99% of the time, and if performance really matters, perhaps you shouldn't be using regexes anyhow.
In practice, the vast majority of production regexen don't require backreferences or generalized assertions, so automata are an attractive option. This is particularly true for security applications (e.g. firewalls/NIDS) and public-facing services like Google Code Search. I wouldn't be surprised if people were attacking Snort boxes via PCRE, for example.
*Probably. Depends if P=NP or not.
The Thompson NFA has its place as a fallback to avoid exponential blowup on pathological regexes, but if I were implementing a performance-critical regex engine, I would start with a JIT for the recursive backtracking approach, because in practice constant factors dominate over algorithmic worst-time bounds, and only add the Thompson NFA later.
I don't feel the constant factors are large on Thompson. It's essentially linear to construct IIRC, and matching next char is then proportional to the number of current States which are valid. Which is bounded by the vertexes in the NFA which will mostly be less than DFA.
I don't see how you can get more efficient than that, because you're just keeping track of all the matching states which would seem to be the minimum required. I'm really keen to learn an even better way.
The proposal is a workable heuristic ( use the practically working algorithm rather than the one with a theoretical low bound which is costly in practice ), and people do well to heed it in many places, however I don't believe it applies here because Thompson is great in practice.
You sound like you know a lot about this, what am I missing?
http://www.fing.edu.uy/inco/cursos/intropln/material/p419-th...