The difference between top-down parsing and bottom-up parsing (opens in new tab)

(qntm.org)

81 pointszniperr11y ago43 comments

43 comments

29 comments · 8 top-level

carsongross11y ago· 6 in thread

I really didn't understand grammars until I started doing hand-rolled top-down recursive descent parsing for Gosu.

It's a shame that parsing in CS schools is taught using theory and lex/yacc type tools when a basic lexer and recursive descent parser can be written from first principles in a few months. It is more incremental, and it gives you a much deeper feel for the concepts, plus it lets you learn a bit about software engineering and organization as well.

userbinator11y ago

Parsing and compilers are an area where I think "the theory gets in the way"; I started by reading the Dragon Book, and while I (vaguely) understood what it was talking about, the feeling it gave was "parsers are complex, use a parser generator instead"... which sounds like a good idea until you actually try to debug one. I eventually realised that recursive descent was much simpler, and with things like the "Let's Build a Compiler" articles everything became so clear and it felt like all that theory I learned before was mostly useless. I think the fact that top-down parsers are theoretically less powerful doesn't make much difference in practice; and the trend of compilers (in C at least) now seems to be to move away from generated parsers and to recursive descent.

Big, "production quality" compilers like GCC and LLVM use RD, and so do small ones like (o)tcc and various others' "toy compiler" projects (e.g. https://news.ycombinator.com/item?id=8558822 and https://news.ycombinator.com/item?id=8576068) The "precedence climbing" method (http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm ), which simplifies and makes RD more efficient, is common too but I don't think I've seen it appear in any of the usual academic texts.

eru11y ago

Funny enough, I found the dragon book way too applied and not theoretical enough. Perhaps it was just a different feeling about the same problem: the dragon book is just not very good anymore.

Something like Parsec seems like a good introduction into writing your own parsers. You can figure out formal grammars later.

ternaryoperator11y ago

It is a shame that Gosu has not become more popular as a JVM language. You guys were doing things that took other languages years to catch up.

carsongross11y ago

Yep, them's the breaks...

We are still working on it and it has a long life ahead of it, since it is crucial to many very large, un-sexy companies.

zura11y ago

Indeed. Also Groovy++ (aka Groovy with static typing) is worth mentioning.

1 more reply

meow_mix11y ago

I'm a sophomore at Ohio State and context free grammars are actually a major topic in one of our early software courses. We also got to write a recursive decent parser for a very simple made up language as one of our projects. Really exciting curriculum over here.

nachivpn11y ago· 5 in thread

Bottom up parsing - "If not, it's necessary to backtrack and try combining tokens in different ways" I feel the way it is put along with shift reduce parsing is misleading. Backtracking is essentially an aspect avoided (more like solved) by shift-reduce parsing. They don't go together in bottom up parsing. Shift reduce parsers posses the potential to predict the handle to use by looking at the contents on top of the stack.

Good job BTW, there are very few people who write about compiler/language theory :)

psykotic11y ago

For unrestricted CFG parsing, shift-reduce parsing does require backtracking. A modern parsing algorithm like GLR uses LR tables but has to resort to backtracking when it encounters shift-reduce conflicts (it avoids the exponential blow-up of backtracking by using dynamic programming to share results for subproblems).

nachivpn11y ago

Like in my other comment, I agree with this. I missed mentioning mentioning my assumption of a forceful shift or reduce action in case of an ambiguous grammar.

1 more reply

canjobear11y ago

You need backtracking if you have an ambiguous grammar. This comes up in natural language parsing; I'd guess that it is avoided in programming languages.

aardvark17911y ago

Actually, many languages which started with hand written parsers do have ambiguous grammars, or have an unambiguous grammar so hideously unwieldy it is best ignored.

This sort of thing can be fixed up with predicates added to the grammar but it always feels like a bodge.

nachivpn11y ago

I totally agree with this. I was assuming the case of a default behavior like shift or reduce force fully for unambiguous grammars, which I should have mentioned. Thanks.

Guthur11y ago· 4 in thread

(facetious-comment "It's a tragedy how much brilliance is wasted on grammar parsing when it's a solved problem; just use a Lisp")

agumonkey11y ago

This was advocated by https://cs.brown.edu/courses/cs173/2012/lang/ too. Parsing is fun but in the end I can't disagree, syntax is rarely that much worth it.

skybrian11y ago

JSON or XML would also work. Except that few people like languages based on XML, and I haven't seen anyone seriously try JSON.

Perhaps someone should try to build a JSON-like language that's close to how most programmers like to write their code?

kyllo11y ago

Yeah, this is why no one likes languages based on XML: http://thedailywtf.com/articles/We-Use-BobX

If you were going to do it, I'm certain you'd want to use a more human-writable format such as YAML or TOML instead of JSON or XML.

But doing so means you're living out Greenspun's Tenth Rule. Again, just use a Lisp. http://www.defmacro.org/ramblings/lisp.html

1 more reply

olavk11y ago

Then again, this is just moving the burden from the language implementation to the developer.

zura11y ago· 2 in thread

One more difference: top-down parsing is European and bottom-up - American :)

twic11y ago

I'm not sure why this was downvoted - it's true(ish), and interesting. Historically, American computer scientists preferred LR parsers, and Europeans preferred LL parsers. That influenced the languages they designed.

I think i read about this in Sedgewick's 'Algorithms in C', although i could be wrong. I struggle to find any online citation. This was mentioned in the Wikipedia article at one point, but disppeared in edit described as "Removed heresay":

http://en.wikipedia.org/w/index.php?title=LL_parser&directio...

rdc1211y ago

That is quite intersting, at one point the main language in AI for Europe was Prolog and in America it was LISP, or so I read. I wonder what other instances this kind of cultural differences occured, and if the internet has had any effect on this.

1 more reply

vorg11y ago· 2 in thread

I think I read somewhere incremental parsers are better off being written with bottomup parsers rather than topdown parsers. The reason was that when a small edit is made to the code being parsed, the artifact from a bottomup parser often only needs a minor change that ripples only as far as it needs to, whereas the topdown parser needs to be completely rerun because it can't tell whether the effect of one small edit is large or localized. Anyone out there who can verify or refute this?

seanmcdirmid11y ago

I've written parsers both as top down and bottom up. Nite that neither is naturally incremental and some memoization is required, so for bottom up, you see if the parent you want to create already exists, for top down, your child. It is sort of a wash which one is better, but both are pretty trivial.

I use top down recursive descent these days backed up with memoization. When a change to the token stream comes, you simply damage the most inner tree that contains the token, or if on a boundary, damage multiple trees.

vorg11y ago

Thanks. I've been writing and using recursive descent parsers a little lately, both Parsec-style and Packrat-style. If it's possible to do decent incremental parsing without wrapping my head around LR-parsing, then I'll give it a try sometime.

2 more replies

slashnull11y ago· 1 in thread

I was so happy when I finally grokked LR parsers: it's just a big state machine! _if_ that token found, push on stack, go to _that_ other state. Consume token, check the next state transition.

But while fun it was pretty much useless because recursive descents and combinators are so much easier.

phyzome11y ago

That moment when you finally get it that CFG parsers can be implemented using push-down automata... :-)

jimmaswell11y ago· 1 in thread

What do you call it if you're just using a big list of regexes? I've seen that used for a simple dialog scripting language in a game.

olavk11y ago

If the language is simple enough to be parsed with only regular expressions, then the language does not have context-free grammar, so bottom-up/top-down distinction does not apply. This also means that the language cannot have recursive productions, e.g. cannot support expressions.

But it depends on how the list of regexes is used. If it is used as part of a recursive paring routine, then it is a recursive-descent parser where lexing and parsing happens in the same pass.

vbit11y ago

Another good write-up on this topic is http://blog.reverberate.org/2013/07/ll-and-lr-parsing-demyst...

j / k navigate · click thread line to collapse

43 comments

29 comments · 8 top-level

carsongross11y ago· 6 in thread

I really didn't understand grammars until I started doing hand-rolled top-down recursive descent parsing for Gosu.

userbinator11y ago

eru11y ago

Funny enough, I found the dragon book way too applied and not theoretical enough. Perhaps it was just a different feeling about the same problem: the dragon book is just not very good anymore.

Something like Parsec seems like a good introduction into writing your own parsers. You can figure out formal grammars later.

ternaryoperator11y ago

It is a shame that Gosu has not become more popular as a JVM language. You guys were doing things that took other languages years to catch up.

carsongross11y ago

Yep, them's the breaks...

We are still working on it and it has a long life ahead of it, since it is crucial to many very large, un-sexy companies.

zura11y ago

Indeed. Also Groovy++ (aka Groovy with static typing) is worth mentioning.

1 more reply

meow_mix11y ago

nachivpn11y ago· 5 in thread

Good job BTW, there are very few people who write about compiler/language theory :)

psykotic11y ago

nachivpn11y ago

Like in my other comment, I agree with this. I missed mentioning mentioning my assumption of a forceful shift or reduce action in case of an ambiguous grammar.

1 more reply

canjobear11y ago

You need backtracking if you have an ambiguous grammar. This comes up in natural language parsing; I'd guess that it is avoided in programming languages.

aardvark17911y ago

Actually, many languages which started with hand written parsers do have ambiguous grammars, or have an unambiguous grammar so hideously unwieldy it is best ignored.

This sort of thing can be fixed up with predicates added to the grammar but it always feels like a bodge.

nachivpn11y ago

I totally agree with this. I was assuming the case of a default behavior like shift or reduce force fully for unambiguous grammars, which I should have mentioned. Thanks.

Guthur11y ago· 4 in thread

(facetious-comment "It's a tragedy how much brilliance is wasted on grammar parsing when it's a solved problem; just use a Lisp")

agumonkey11y ago

This was advocated by https://cs.brown.edu/courses/cs173/2012/lang/ too. Parsing is fun but in the end I can't disagree, syntax is rarely that much worth it.

skybrian11y ago

JSON or XML would also work. Except that few people like languages based on XML, and I haven't seen anyone seriously try JSON.

Perhaps someone should try to build a JSON-like language that's close to how most programmers like to write their code?

kyllo11y ago

Yeah, this is why no one likes languages based on XML: http://thedailywtf.com/articles/We-Use-BobX

If you were going to do it, I'm certain you'd want to use a more human-writable format such as YAML or TOML instead of JSON or XML.

But doing so means you're living out Greenspun's Tenth Rule. Again, just use a Lisp. http://www.defmacro.org/ramblings/lisp.html

1 more reply

olavk11y ago

Then again, this is just moving the burden from the language implementation to the developer.

zura11y ago· 2 in thread

One more difference: top-down parsing is European and bottom-up - American :)

twic11y ago

http://en.wikipedia.org/w/index.php?title=LL_parser&directio...

rdc1211y ago

1 more reply

vorg11y ago· 2 in thread

seanmcdirmid11y ago

vorg11y ago

2 more replies

slashnull11y ago· 1 in thread

I was so happy when I finally grokked LR parsers: it's just a big state machine! _if_ that token found, push on stack, go to _that_ other state. Consume token, check the next state transition.

But while fun it was pretty much useless because recursive descents and combinators are so much easier.

phyzome11y ago

That moment when you finally get it that CFG parsers can be implemented using push-down automata... :-)

jimmaswell11y ago· 1 in thread

What do you call it if you're just using a big list of regexes? I've seen that used for a simple dialog scripting language in a game.

olavk11y ago

But it depends on how the list of regexes is used. If it is used as part of a recursive paring routine, then it is a recursive-descent parser where lexing and parsing happens in the same pass.

vbit11y ago

Another good write-up on this topic is http://blog.reverberate.org/2013/07/ll-and-lr-parsing-demyst...

j / k navigate · click thread line to collapse