It's a shame that parsing in CS schools is taught using theory and lex/yacc type tools when a basic lexer and recursive descent parser can be written from first principles in a few months. It is more incremental, and it gives you a much deeper feel for the concepts, plus it lets you learn a bit about software engineering and organization as well.
Big, "production quality" compilers like GCC and LLVM use RD, and so do small ones like (o)tcc and various others' "toy compiler" projects (e.g. https://news.ycombinator.com/item?id=8558822 and https://news.ycombinator.com/item?id=8576068) The "precedence climbing" method (http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm ), which simplifies and makes RD more efficient, is common too but I don't think I've seen it appear in any of the usual academic texts.
Something like Parsec seems like a good introduction into writing your own parsers. You can figure out formal grammars later.
We are still working on it and it has a long life ahead of it, since it is crucial to many very large, un-sexy companies.
Good job BTW, there are very few people who write about compiler/language theory :)
This sort of thing can be fixed up with predicates added to the grammar but it always feels like a bodge.
Perhaps someone should try to build a JSON-like language that's close to how most programmers like to write their code?
If you were going to do it, I'm certain you'd want to use a more human-writable format such as YAML or TOML instead of JSON or XML.
But doing so means you're living out Greenspun's Tenth Rule. Again, just use a Lisp. http://www.defmacro.org/ramblings/lisp.html
I think i read about this in Sedgewick's 'Algorithms in C', although i could be wrong. I struggle to find any online citation. This was mentioned in the Wikipedia article at one point, but disppeared in edit described as "Removed heresay":
http://en.wikipedia.org/w/index.php?title=LL_parser&directio...
I use top down recursive descent these days backed up with memoization. When a change to the token stream comes, you simply damage the most inner tree that contains the token, or if on a boundary, damage multiple trees.
But while fun it was pretty much useless because recursive descents and combinators are so much easier.
But it depends on how the list of regexes is used. If it is used as part of a recursive paring routine, then it is a recursive-descent parser where lexing and parsing happens in the same pass.