> (1) Perform a fast, sequential "skeleton parsing" pass before the main parse that scans just enough to find "split points" that are consumed by the parallel parser.
I'm not able to access the full text of this paper. But from the description I wouldn't really consider this "parallel parsing." For the "skeleton parser" to be correct, it must transition through a state machine that is exactly as complex as the real parser. I suspect (again, not being able to read the paper right now) that what makes the "skeleton parse" faster than the "real parse" is not the speed of the parser itself, but the speed of the "load" on the parser.
For the skeleton parse, the "load" on the parser is just finding split points (cheap). At the application level, the "load" on the parser in many cases is building a tree of some sort. The tree-building is often significantly more expensive than the parse itself because it usually involves a lot of dynamic memory allocation.
So yes, if you do a preliminary parse that chunks up the document, and then a second parse that has a heavier load on it like tree-building, the second parse can indeed be parallelized. But I wouldn't consider this parallelizing a parser, I would consider it parallelizing the tree-building. In raw terms you have probably spent more CPU on the actual parsing logic than in the single-threaded case.
I don't mean for this to be a semantic quibble. I'm really interested in parsing architectures that decouple the parser itself from its "load." Event-based parsers like SAX parsers do this. I'm interested specifically in the parser part itself, and the limits of how it can be optimized.
> (2) Guess the state you're in based on some heuristics, and roll back on failure. This actually works surprisingly well in practice for many grammars, for example HTML [2].
Looks like an interesting paper, I'll have to dig more into that.