Writing a C Compiler (2017) (opens in new tab)

(norasandler.com)

107 pointslrsjng3y ago43 comments

43 comments

25 comments · 5 top-level

hcks3y ago· 11 in thread

Yet another “compiling” course that puts all the emphasis on parsing.

Rule of thumb: parsing/lexing shouldn’t takes more than 10% of your compiler course.

This attitude bugs me a lot. It seems really common, especially in more recent texts about language design and implementation, that parsing is heavily de-emphasized to the point where practically nobody talks about it. See Essentials of Programming Languages by Friedman & Wand, the relevant sections in SICP, Programming Languages: Application & Interpretation (which goes so far as to call it a distraction).

I get that parsing is more of an implementation detail and doesn't really belong to the space-brained realm of language design per se, but it's a bit annoying that most texts refuse to give any space to the topic, and rely on your language being S-expression based or assume you're going to use a parser generator. Like, in the real world, even if one will never actually implement a fully-fledged programming language, you're still probably going to have to parse things sometimes. I would love a book that goes into detail about different parsing techniques and considers best practices and patterns and tradeoffs/design considerations -- would pay good money for that

It reminds me somewhat of the situation in analysis, where there are lots of theorems that aren't written down anywhere because literally every book states them as "easy" exercises. Maybe I'm looking in the wrong places, but I can't find much in the way of concrete guidance on implementing parsers. I'm aware of the beautiful series on parsing theory by Aho & Ullman ("The Theory of Parsing, Translation, and Compiling"), but those are more focused on theory rather than implementation

cdcarter3y ago

On the other hand, historically (and as the parent you're replying to points out), many compiler texts have spent a MAJORITY of their time on parsing, and rush through the actual interesting parts of compilation.

> I would love a book that goes into detail about different parsing techniques and considers best practices and patterns and tradeoffs/design considerations -- would pay good money for that

Terrence Parr's "Language Implementation Patterns" spends quite a bit of time on parsing, and parse tree->ast conversyions.

1 more reply

marssaxman3y ago

> Like, in the real world, even if one will never actually implement a fully-fledged programming language, you're still probably going to have to parse things sometimes.

That is definitely true, but in practice there isn't much to say about it, because sophisticated parsers turn out not to be particularly important; it works out better overall to design simple grammars, and then the parsing is easy.

- If you're a beginner, you'll write a recursive descent parser, because that's the simplest technique, and it lets you focus on your project instead of a new, unfamiliar tool.

- If you're writing a domain-specific language, or a config format, or something of that nature, you'll use whichever parser generator integrates most conveniently into your workflow, and you'll design your grammar around whatever its manual tells you to do.

- If you're writing a full-scale language compiler, you'll go back to recursive descent, because that offers the easiest way to recover from errors and report informative messages. Maybe you'll throw in precedence-climbing for operators.

> I would love a book that goes into detail about different parsing techniques and considers best practices and patterns and tradeoffs/design considerations -- would pay good money for that

I would also read such a book, but it would be more of a book about parser generators than a book about parsers.

munificent3y ago

Almost all real-world projects that are language-like or compiler-like will need a parser. A much smaller fraction of them will need register allocation, instruction selection, optimization, code generation, etc.

For every big, deep, native code compiler, there are a hundred template languages, config files, report generators, etc. all of which are real programs providing real value for actual people.

Emphasizing parsing provides the most value for the greatest number of people. The folks that do end up needing more back end depth will still have the resources available to learn it.

patrec3y ago

Contrarian take: lots of people doing parsing, has, on the whole, highly negative value, and template languages and config files are a prime example of this.

Everybody and their dog thinks it necessary to inflict some new sub-par language on us when in about 99.9% of cases they should just either have stuck to s-expressions or some suitable subset of a popular programming or existing config language with a relatively sane syntax (blaze/bazel did that right, cmake did that very very wrong).

When was the last time you looked at some config file and thought, wow I'm so glad they didn't use toml or python or whatever, but instead made up some completely new syntax nothing in the world apart from this tool itself can parse and that I can't programaticaly manipulate?

When was the last time you thought, wow I am so glad that someone invented a new templating language that creates some new injection vulnerabilities, because no one apart from the lisp people ever seem to have worked out that if you want to interpolate into something tree shaped, you should have a tree-based interpolation syntax? Because although sexps and quasiquote solve this very nicely and concisely everyone else still seems to love string-bashing plus some ad-hoc "escaping" system for this. And one reason for this is of course precisely the enormous abundance of idiotic config languages that can't be easily manipulated as anything than opaque strings.

[Edit: if you do create a new config file language, pretty please provide some means to directly query and losslessly manipulate it; for the lossless part you will either need to have first class comments unambiguously attached to a particular syntactic construct and agreed upon deterministic formatting or IDE-style complexity, the first one is probably a better idea]

throwaway17_173y ago

Do you have a ‘best of list’ for the resources when interested in back-end topics.

1 more reply

WalterBright3y ago

Parsing takes a weekend. The rest takes a year to get a rudimentary compiler working.

hota_mazi3y ago

I disagree.

As opposed to most compiler articles, this one actually covers code generation for every section of its chapters, which is really great.

I also like that every chapter focuses on a specific feature and describes how to implement it end to end: lexical/syntactic parsing, AST, and x86_64 generation.

Great series!

marcosdumay3y ago

On the other hand, parsing text could easily be a very valuable course on its own. You just have to not keep it restricted to programing languages, and include the knowledge created on this century.

wasimanitoba3y ago

anything better you'd recommend?

tester7563y ago

parsing is cool

userbinator3y ago· 8 in thread

IMHO writing a compiler for a high-level language, in an even higher level language, somehow feels a bit "anachronistic" (for lack of better word).

retrac3y ago

Most of the current major C implementations are written in C++.

WalterBright3y ago

ImportC is written in D.

userbinator3y ago

That's unfortunate.

golergka3y ago

Weren't a lot of functional languages, like ML and it's descendants, created specifically to write parsers and compilers?

peterfirefly3y ago

No. ML was the meta language for a theorem prover (LCF).

https://en.wikipedia.org/wiki/Logic_for_Computable_Functions

JonChesterfield3y ago

I've seen it claimed that ML was originally written, in lisp, in order to have a better language to write compilers in.

1 more reply

avgcorrection3y ago

For what reason?

userbinator3y ago

It's backwards. Writing a C compiler in C or Asm makes sense, a Python compiler in C also does, but a C compiler in Python is an odd inversion of abstraction.

2 more replies

RcouF1uZ4gsC3y ago· 1 in thread

Here is how to write a C compiler in Python that correctly compile the vast majority of C programs per the ISO C standard:

    print(“You have some form of undefined behavior, which means printing this is a valid response per the C standard”)

tialaramex3y ago

Undefined Behaviour has to actually happen, and so that means at runtime†, and thus what you wrote is not a valid C compiler.

For C++ IFNDR ("Ill-formed, No diagnostic required") the situation is trickier because the affected programs (some unknowable but likely large proportion of all purported C++ code) are not well formed C++, the standard offers no hint as to what happens or why, since it constrains only the behaviour of a C++ compiler for well formed C++ programs.

† It's possible the C lexer claims to have some "Undefined Behaviour" cases like the C++ lexer, hence P2621 "UB? In my lexer?" which is a reference to a 2005 meme because C++ standards committee members are down with the kids, but that's clearly a standards text bug if so because it makes no sense to have UB in the lexer, these should just be ill-formed programs, you get a compiler error.

dananjaya863y ago

Book version to be released in October '23 : https://nostarch.com/writing-c-compiler

1 more reply

bigdict3y ago

Has anyone worked through this? Is it a good (soon to be) book?

j / k navigate · click thread line to collapse

43 comments

25 comments · 5 top-level

hcks3y ago· 11 in thread

Yet another “compiling” course that puts all the emphasis on parsing.

Rule of thumb: parsing/lexing shouldn’t takes more than 10% of your compiler course.

vector_spaces3y ago

cdcarter3y ago

> I would love a book that goes into detail about different parsing techniques and considers best practices and patterns and tradeoffs/design considerations -- would pay good money for that

Terrence Parr's "Language Implementation Patterns" spends quite a bit of time on parsing, and parse tree->ast conversyions.

1 more reply

marssaxman3y ago

> Like, in the real world, even if one will never actually implement a fully-fledged programming language, you're still probably going to have to parse things sometimes.

- If you're a beginner, you'll write a recursive descent parser, because that's the simplest technique, and it lets you focus on your project instead of a new, unfamiliar tool.

> I would love a book that goes into detail about different parsing techniques and considers best practices and patterns and tradeoffs/design considerations -- would pay good money for that

I would also read such a book, but it would be more of a book about parser generators than a book about parsers.

munificent3y ago

For every big, deep, native code compiler, there are a hundred template languages, config files, report generators, etc. all of which are real programs providing real value for actual people.

Emphasizing parsing provides the most value for the greatest number of people. The folks that do end up needing more back end depth will still have the resources available to learn it.

patrec3y ago

Contrarian take: lots of people doing parsing, has, on the whole, highly negative value, and template languages and config files are a prime example of this.

throwaway17_173y ago

Do you have a ‘best of list’ for the resources when interested in back-end topics.

1 more reply

WalterBright3y ago

Parsing takes a weekend. The rest takes a year to get a rudimentary compiler working.

hota_mazi3y ago

I disagree.

As opposed to most compiler articles, this one actually covers code generation for every section of its chapters, which is really great.

I also like that every chapter focuses on a specific feature and describes how to implement it end to end: lexical/syntactic parsing, AST, and x86_64 generation.

Great series!

marcosdumay3y ago

On the other hand, parsing text could easily be a very valuable course on its own. You just have to not keep it restricted to programing languages, and include the knowledge created on this century.

wasimanitoba3y ago

anything better you'd recommend?

tester7563y ago

parsing is cool

userbinator3y ago· 8 in thread

IMHO writing a compiler for a high-level language, in an even higher level language, somehow feels a bit "anachronistic" (for lack of better word).

retrac3y ago

Most of the current major C implementations are written in C++.

WalterBright3y ago

ImportC is written in D.

userbinator3y ago

That's unfortunate.

golergka3y ago

Weren't a lot of functional languages, like ML and it's descendants, created specifically to write parsers and compilers?

peterfirefly3y ago

No. ML was the meta language for a theorem prover (LCF).

https://en.wikipedia.org/wiki/Logic_for_Computable_Functions

JonChesterfield3y ago

I've seen it claimed that ML was originally written, in lisp, in order to have a better language to write compilers in.

1 more reply

avgcorrection3y ago

For what reason?

userbinator3y ago

It's backwards. Writing a C compiler in C or Asm makes sense, a Python compiler in C also does, but a C compiler in Python is an odd inversion of abstraction.

2 more replies

RcouF1uZ4gsC3y ago· 1 in thread

Here is how to write a C compiler in Python that correctly compile the vast majority of C programs per the ISO C standard:

    print(“You have some form of undefined behavior, which means printing this is a valid response per the C standard”)

tialaramex3y ago

Undefined Behaviour has to actually happen, and so that means at runtime†, and thus what you wrote is not a valid C compiler.

dananjaya863y ago

Book version to be released in October '23 : https://nostarch.com/writing-c-compiler

1 more reply

bigdict3y ago

Has anyone worked through this? Is it a good (soon to be) book?

j / k navigate · click thread line to collapse