Writing a C Compiler: Build a Real Programming Language from Scratch (opens in new tab)

(nostarch.com)

274 pointsshoggouth1y ago156 comments

156 comments

86 comments · 22 top-level

hasbot1y ago· 11 in thread

So what's different about writing a compiler in 2024 than say 10, 20, or 30 years ago? When I started writing compilers in the 80's and 90's lex/flex and yacc/bison were popular. ANTLR came out but I never had a chance to use it. Everything after lexing and parsing was always hand rolled.

jcranmer1y ago

There are actually quite a few changes!

The most obvious change you'll see is the use of SSA, which has become the dominant representation in IR starting 25-30 years ago.

There's also been an increase in the importance of compiler IRs, and especially the concept of code passing through multiple IRs before reaching machine code.

Formal semantics has become more of a thing in the past decade or so. It's now routine that even weak memory models have a detailed formal model of how they work. In LLVM, it's now a requirement that you demonstrate formal correctness of new InstCombine transformations (which are essentially peephole optimizations).

The use of parser generators has really fallen into disrepute; everything has transitioned to handrolled parsers these days. Language standards themselves are starting to rely on context-sensitive keywords, which are hard to implement in a generator-based lexer/parser setup.

Optimizations have generally broadened in scope; we're now seeing whole-function level of optimization being the default way to look at stuff for analysis, and there's been a variety of techniques introduced to make whole-program optimization (aka LTO) much more tractable.

Another subtle but major shift is that compilers are increasingly reliant on inferred analysis from dumber representations over the original declared intent of the code. For example, SROA in LLVM (which breaks up structs so that individual fields can be independently allocated in registers) relies not on looking at the way the struct is declared but the offsets within the struct that various load and store operations use.

A final major shift is the trend towards support for ancillary tooling in the programming language space, so that the compiler isn't merely a tool that goes from source to object code, but is something that can be actively queried for information about the source code. Things like the language server, or smart formatting, or automatic refactoring tooling.

elteto1y ago

This is a fantastic comment, thanks.

late_again1y ago

You can use the "classic" tool set and parse many programming languages. The real trick in writing compilers is taking advantage of the hardware (disclaimer: I designed and wrote middle-pass portions for IBM 360/370 processors (and clones), supercomputers from Cray and ETA Systems, and Sun 4 workstations among others, including some RISC systems that just disappeared around the bursting of the dot-com bubble) I tried and failed to optimize MPP systems from all of the "name" players in the 1990's. It kind-of broke my heart...

That "middle-pass" approach that will let you address many targets is still valid; the trick is finding a sufficiently robust and flexible internal representation at the right level. You also have to be able to out-guess the chip vendors where before you could go to the architect or a complete "System" book and get the real scoop, including things you shouldn't do. Oddly enough, there is simultaneously useful and completely worthless documentation scattered about the internet.

You might want to take a look at Muchnick and Jones' _Program_Flow_Analysis_ (yes, it's from 1981) but chapters 4-6 can be applied at code-generation time. How that fits modern Intel processors (for example) is unknown. Idealizing your processor as a RISC-V might be a reasonable way to proceed but in the end, you'll have to translate the code for the target -- it will be reasonably straight-forward if you drive it all from tables but it's not trivial.

trealira1y ago

The book doesn't have 2024 in the title. I suspect they put it there because last time a post about this book was made, I noted that it was from 2022, not realizing that the book has now been released in 2024.

https://news.ycombinator.com/item?id=40940799

> So what's different about writing a compiler in 2024 than say 10, 20, or 30 years ago?

As far as I can tell, the main difference is that static single assignment (SSA) as an intermediate form was not the norm 30 years ago, but it is nowadays. Also, in newer books, it's more common to go over global register allocation now, whether that's graph coloring or linear scan register allocation. If you read old compiler books, the main optimizations they talk about are use-def chains, moving computations out of loops, and using the local and tree-based Sethi-Ullman register allocation algorithm.

pfdietz1y ago

Compiling today might be done at run time, exploiting dynamic information. The line between a compiler and an interpreter is blurred.

shortrounddev21y ago

ANTLR generate a visitor system from a BNF grammar that you need to implement the logic of in your chosen language, which I believe is similar to C++'s std::visit. lex and yacc generate state machines using BNF grammar and you implement the logic in the tools themselves

mjburgess1y ago

Parser-generators were always academic projects that had little relevance to making real-world programming languages -- where parsing is very easy to write, and necessarily benefits from doing it (ie., you can get better error handling/etc.).

Today most languages are front-ends for LLVM IR, but LLVM is very slow and takes a long time to optimize. Many new languages target x86/arm directly with their own weakly optimized backends, and output an LLVM IR for "release builds".

kragen1y ago

yacc/bison was used for lex, bc, pcc, gcc, original awk, the bsd pascal compiler, eqn, m4 (!), and many other languages. it's still used for pcc, oawk, mawk, pari/gp, and units. that's just what i have sitting around in my downloads directory

and, while we're talking about ocaml, ocaml does use ocamllex and ocamlyacc for its own parser

so, while you can certainly do without parser generators, they have very commonly been used for making real-world programming languages. almost every programming language anyone here has ever heard of was first implemented with a parser generator. the main exceptions are probably fortran, cobol, algol, lisps, c, and pascal

2 more replies

thechao1y ago

When your computer was anemic, and could barely do the tasks required for it, eking out a few percent — or a 2x! — from an optimizer was important.

Now-a-days, the difference between "big compiler optimized" and "little compiler not optimized" can be quite dramatic; but, is probably no more than 4x — certainly within range of the distinction between "systems programming language" and "high tuned JITted scripting language". I think most people are perfectly fine with the performance of highly-tuned scripting languages. The result is that all of the overhead of "big compiler" is just ... immaterial; overhead. This is especially true for the case of extremely well-tuned code, where the algorithm and — last resort — assembly, will easily beat out the best optimizer by at least an order-of-magnitude, or more.

1 more reply

Croftengea1y ago

> Parser-generators were always academic projects

Were they? GCC abandoned bison in favour of their own parser relatively recently.

1 more reply

sestep1y ago

Could you give some specific examples of those new languages with their own backends for faster builds?

1 more reply

badsectoracula1y ago· 8 in thread

Weird that this is about building a C compiler[0] in OCaml. I expected the implementation language to also be C both for consistency but also because i'm willing to bet that there are more people who can read C than OCaml.

[0] actually from the readme in the github repo[1] it seems to be a C subset, not all of C

[1] https://github.com/nlsandler/nqcc2

kragen1y ago

ocaml makes writing a compiler enormously more accessible, and learning to read ocaml, while it can be somewhat intimidating at first, is much easier than learning to write a compiler

(imagine a medieval accountant trying to learn to do long division in roman numerals. he'll be much better off learning the western arabic numerals fibonacci is so excited about)

shortrounddev21y ago

I really really really want to get more into Ocaml but as far as I know there is no good support for a debugger to use in an IDE like VSCode or even vim. Everyone I talk to says they just do print debugging. It's easier to 'reason about your code' in FP, but I do NOT want to go back to a time where I coded without breakpoints. I use F# because of its first party tooling support

1 more reply

norir1y ago

There are many disadvantages to writing a compiler in c (and I have done it). For me, the biggest is simply that it is very verbose for the type of things that you do commonly in a compiler. I have written a self-hosting compiler that transpiles to c and the generated code is about 5x as long as the original code. I could go through and clean it up and probably get it down to 2-3x, but there are certain concepts that cannot easily be expressed compactly in c (except possibly with preprocessor abuse that I don't have patience for). A simple example is that if you want to represent a union type you have to manually define 2 structs and an enum. Matching with switch is also verbose as you have to both match and manually assign the value in the match case. Again, you can use macros to do this though at that point you arguably aren't even using c but rather a hybrid language. A language like ocaml, makes it much more straightforward to define and match on unions as well as gives you other nice high level things like gc so you can focus on the compiler itself and not low level details.

anta401y ago

Written in OCaml? Ahh interesting. Suddenly I remember similar book:

Modern Compiler Implementation in ML: https://www.cs.princeton.edu/~appel/modern/ml/

As an undergrad student, I think the C version is kinda easier to understand, though.

userbinator1y ago

The bonus of writing a C compiler in C is that you get to being able to experiment with self-compilation.

Croftengea1y ago

OCaml? Thanks for saving me a click!

hdbxbxndj1y ago

OCaml is one of the most used languages for compiler design

A good engineer should be able to use the right tool for the job

1 more reply

wang_li1y ago

A Retargetable C Compiler is another book that implements a C compiler in C.

https://www.amazon.com/Retargetable-Compiler-Design-Implemen...

quibono1y ago· 8 in thread

I swear I've seen this cover before... is this a new release or an updated edition of an older book?

halfcat1y ago

”Automate the Boring Stuff with Python” has a similar cover, by the same publisher.

jdnendm1y ago

Book is not yet published but in early access since a couple of years

Was featured here a couple of times.

Unfortunately the timing of the release is quite unfortunate with regards to the summer holidays. Will take a look at it next year

sgbeal1y ago

> Book is not yet published but in early access since a couple of years

According to the top post's link, it was released in July 2024.

1 more reply

byteplane1y ago

It’s actually out now, I have a copy! Ordered directly fro No Starch Press.

thejteam1y ago

There was a HN article about the same book about a month ago:

https://news.ycombinator.com/item?id=40940799

So maybe you saw it then.

orktes1y ago

Many compiler related books take inspiration from the "Dragon book" (Compilers: Principles, Techniques and Tools). So with likely lots of books with similar looking covers.

hdbxbxndj1y ago

The cover looks nothing like the dragon book however?

1 more reply

Almondsetat1y ago

I believe the author first started by making blog posts and then interrupted them to simply make a book about it

francogt1y ago· 6 in thread

I see many comments saying that the book implements the C compiler in ocaml. In the introduction the author states that the book actually uses pseudo code so you are actually free to implement it in any language. The only recommendation is that you use a language with pattern matching because the pseudo code makes heavy use of it. The reference implementation is in ocaml.

markus_zhang1y ago

Thanks, can you please lemme know which part uses pattern matching? I'd assume mostly in the lexer, but the parser should just be something that consume the tokens and spit out AST. Unless of course it combines the two.

shawn_w1y ago

Presumably anything that walks the syntax tree.

1 more reply

CrimsonCape1y ago

Question for HN, pattern matching is defined as “runtime type/value checking”, is that correct?

Is duck typing the pseudo-unsafe alternative? (Not unsafe as in accessing unsafe memory, but as in throwing exceptions if the duck-typed function doesn’t exist on the current type)

Can C handle both?

Coming from a static type system like rust and c#, i’m doing alot of “if this is a duck, duck.quack()” and i’m looking for faster alternatives and less verbosity if possible

trealira1y ago

One thing is that pattern matching can make writing tree manipulation code succinct and easier to read. For example, take this article[0] that describes the difference list algorithm (in Haskell). Basically, it's kind of like a rope, but for lists. It's a tree with lists at the leaves, and when you want to convert it into a list, you rewrite the tree to be right-leaning, and then concatenate all the lists at once. This turns repeated concatenation at the end of lists from taking quadratic time into one that takes linear time (strcpy can be an example of this in C [1]). The code can be written like this:

  data Tree a = Leaf a | Branch (Tree a) (Tree a)

  fromList :: [a] -> Tree [a]
  fromList = Leaf

  toList :: Tree [a] -> [a]
  toList (Leaf x) = x
  toList (Branch (Leaf x) r) = x ++ toList r
  toList (Branch (Branch l1 l2) r)
               = toList (Branch l1 (Branch l2 r))

  append :: Tree [a] -> Tree [a] -> Tree [a]
  append = Branch

In a language that doesn't have tree pattern matching, the code wouldn't be this short and easy to understand, and I don't think it could be replicated just by having duck typing. Rust has pattern matching, but because it's primarily focused on lower-level concerns like pointers and memory ownership, pattern matching isn't this nice, because you have to pattern match on the pointer first.

Since a compiler is all about tree manipulation, support for tree pattern matching should be a boon.

[0]: http://h2.jaguarpaw.co.uk/posts/demystifying-dlist/

[1]: https://en.wikipedia.org/wiki/Joel_Spolsky#Schlemiel_the_Pai...

ashconnor1y ago

Useful list considering that feature: https://en.wikipedia.org/wiki/Category:Pattern_matching_prog...

songbird231y ago

I can implement it in rust?

alok-g1y ago· 5 in thread

I would love to see a book that talks about going all the way to generate machine code, i.e., not stopping at generation of assembly.

Alternatively, I would like to learn about not just how to make a compiler, but also simultaneously a debugger, hot-reloading, etc.

synack1y ago

The debugger book is coming soon. https://nostarch.com/building-a-debugger

alok-g1y ago

Awesome! Thanks.

hdbxbxndj1y ago

Writing an simple assembler is trivial. Even macro assemblers are very easy.

However, it's also boring.

Nevertheless the contents of the book cover all the techniques required to write an assembler, if you'd really like to

alok-g1y ago

I understand that assembly file can be parsed in the same way. However, I want to learn about the machine instructions to the level of bits, and likewise the layouts of binary files. Unless I am able to go all the way to machine code loaded in memory, I would not know where in memory to add a breakpoint instruction when a developer wants the same on a line of code.

If there is some library that can help create machine code from assembly instructions on a line by line basis (at least as opposed to invoking a separate program that generates the entire binary collectively from the assembly code), that could also work.

In my case, I already know enough of the lexer, parser, etc., parts. What's missing is going all the way to making a debugger, profiler, etc.

3 more replies

peterfirefly1y ago

There can be weird interactions unless there are strong enough limits on what kind of expressions the assembler allows. Especially if it supports conditional assembly and loops in the macros. One ugly way around it -- which causes its own headaches -- is to introduce pass-sensitive conditional assembly (as in "if in pass 1/2/...").

It's also "fun" if some instructions come in different sizes... and you may need stronger restrictions on allowed expressions in that case.

i_don_t_know1y ago· 5 in thread

Somewhat unrelated: Is there a book that walks you through building a database system from storage to queries, optimizer, execution, indexing, transactions, etc?

rednab1y ago

Database Design and Implementation, ISBN 3030338355 ¹). Java source code for the SimpleDB system from the book available from the author's website ²).

¹) https://www.amazon.com/dp/3030338355/

²) http://www.cs.bc.edu/~sciore/simpledb/

gtirloni1y ago

kragen1y ago

transaction processing by gray (rip) and reuter was pretty close back in the 90s. i don't think it covered query optimization because it's really about tp monitors rather than databases, but, perhaps surprisingly, it does cover the other topics you're asking about

rramadass1y ago

In the early 90's Al Stevens wrote 2 books C Database Development and C++ Database Development with source code which might be a good starting point.

myth_drannon1y ago

Interesting suggestion! here is the book on archive.org: https://archive.org/details/cdatabasedevelop00stev/mode/2up

signaru1y ago· 3 in thread

Have read the first few chapters and it expects that you either read the accompanying source code or implement your own and pass the tests. The pseudo code presented in the book often look like function calls with the inner details not there in the book. Furthermore, as already pointed out in another comment, the available implementation is in OCaml, which is probably not something many C programmers have experience with.

Nevertheless, I think I'm learning more from this book than most other books I've tried before that are far more theoretical or abstract. I'm still eager to reach the chapter on implementing C types. I think it's a good book, but it requires more effort than something like Crafting Interpreters or Writing a Compiler/Interpreter in Go, while also covering topics not in those books.

wrycoder1y ago

Nand2Tetris is also like that - they provide an outline and tests, but you have to do the work. And, having the implementation language be different from the target language reduces confusion.

Plus, you get to become proficient in OCaml, which is a pretty good language.

kragen1y ago

that's a good point—it was pretty confusing when i wrote ur-scheme in scheme, or stoneknifeforth in stoneknifeforth, because i kept getting confused about which level of the language i was changing things in

myko1y ago

I thought this book looked neat but closed the tab before reading the comments here, and after this one decided to go ahead and buy it. Sounds really fun!

Coolbeanstoo1y ago· 3 in thread

This looks cool, been interested in learning more about compilers since I did the basics in college. Lots of things seem to focus on making interpreters and never make it to the code generation part so its nice to see that this features information about that.

spinningslate1y ago

With no disrespect to the book that's the subject of this thread as I haven't read it, but Bob Nystrom's Crafting Interpreter [0] is a fantastic book. It covers all phases in compilation, including both an interpreter and a VM.

It's been covered on several threads here over the years [1].

[0]: https://craftinginterpreters.com/ [1]: https://hn.algolia.com/?q=crafting+interpreters

jcpst1y ago

I remember seeing this a while back. That typesetting is beautiful. Thank you for bringing it up here, I might have to pick that one up.

I’ve been bored with building line-of-business applications, despite designing for complex requirements in high-volume distributed systems.

In fact I took a break from CS learning entirely 9 months ago. Even reading HN. I’ve been studying electronics and analog signal processing instead.

But now that I’ve built about 50 guitar pedals of increasing complexity, I feel ready to switch back to CS studies again.

agent2811y ago

This book covers compiling to assembly whereas Crafting Interpreters only has a bytecode VM implementation. We'll see how good this book is when it drops, but I think that's a worthwhile feature that Crafting Interpreters punted on.

sergius1y ago· 3 in thread

How does it compare with N.Wirth's?

https://onlinebooks.library.upenn.edu/webbin/book/lookupid?k...

cxr1y ago

Wirth's book does not implement a "real" programming language. Whatever your thoughts on Oberon and Pascal-like SHOUTCASE languages, it's largely irrelevant. Oberon is arguably a "real" language (and operating system), but Wirth's book does not cover the implementation of Oberon. It covers the implementation of Oberon0, an inarguably toy subset of Oberon. (Actually, "subset" is not even correct.) The example code has also diverged from the book, with Wirth abandoning the strategy described in the book for avoiding redundant initialization of the module static base, among other things.

Aside from that, I encourage everyone who cites Compiler Construction to actually work through the first 10% of the book and then count the number of errata.

hdbxbxndj1y ago

The book is a very hands on tutorial whereas Wirths is basic literature for the general case.

While they teach similar content, they have a different approach.

There are literally thousands of compiler design books out there, I don't really see anything particularly comparable between this book and Wirth's

anta401y ago

Similar to studying OS concepts using Silberschatz' Operating System Concept and Tanenbaum's Operating Systems Design and Implementation. The former only explains the theoritical ideas, while the latter is the documentation of an implementation.

WalterBright1y ago· 2 in thread

I learned how to write a compiler by studying BYTE magazine in the 70's which published the source to a complete Pascal compiler as an article!

https://archive.org/details/byte-magazine-1978-09 (part 1)

All 3 parts of Tiny Pascal:

https://albillo.hpcalc.org/publications/Easter%20Egg%20-%20T...

barelyauser1y ago

The Byte magazine is incredible. First time reading it. The archive.org collection is a gold mine for learning. Thank you very much for posting it.

nj5rq1y ago

Thank you for sharing this, very useful. The BYTE magazine is absolutely amazing, it's a shame nothing similar could be done today.

carom1y ago· 2 in thread

I took a compilers course in university and the course culminated in having a compiler for C Minus (a subset of C). The professor noted how each year the line count of the compilers was dropping as students found ways libraries or languages that made it easier. I think the evolution was Java -> Antlr -> Python. I used OCaml and emitted LLVM and blew that metric out of the water.

ccmcarey1y ago

Blew it out of the water with more or less lines of code? :)

carom1y ago

Far fewer, to the point of another student asking me what I even did for the project because I didn't have to implement any of the algorithms.

the_panopticon1y ago· 2 in thread

In Ocaml, interesting. I was similarly surprised when I learned that the firs Rust compiler was written in Ocaml, too https://users.rust-lang.org/t/understanding-how-the-rust-com...

bunderbunder1y ago

ML (short for "meta-language") was originally designed for use in programming language research, and really shines for that purpose. And OCaml is probably the most pragmatic dialect for the purpose.

SML is very dated and the standard library and ecosystem lack many things that are considered table stakes for a viable programming language nowadays. And F# and Scala are fine as enterprise languages, but being tied to .NET and Java respectively makes them less desirable for implementing a language that won't itself be coupled to one of those runtimes.

mananaysiempre1y ago

Tree processing is best done in a language with decent algebraic datatypes and pattern matching. I would’ve preferred Standard ML, but, well, pot-ay-to, pot-ah-to. Haskell is another choice but the techniques you need to use there (while undeniably gaining you some possibilities) don’t really generalize to other languages, so you’re now writing a book about compiler construction in Haskell rather than just compiler construction. Ditto for Rust. Kotlin has deliberately anemic pattern matching. C# or F# leave you depending on Microsoft’s benevolence (sic). Metalua and Sweet.js both have decent ADT support but both are pretty much dead. Racket exists, I guess, and there are some pattern-matching libraries for normal Scheme as well, but the charisma malus of the parenthesis is real even if I don’t understand what causes it.

So OCaml was probably the most mainstream choice among the languages with appropriate tools, as funny as that sounds. And honestly, once you get over the syntax, it doesn’t actually have anything outrageous.

sylware1y ago· 2 in thread

I wonder why there is not the same book for c++... mmmmh... I really wonder... (irony).

sylware1y ago

It is because c++ has an absurdely and grotesquely massive and complex syntax (like rust...).

stevefolta1y ago

Yeah, Rust is the language for people who think C++ is not complex (or hostile) _enough_.

1 more reply

viraj_shah1y ago· 2 in thread

Dropping this one here! (no affiliation)

https://www.linuxfromscratch.org/

"Linux From Scratch (LFS) is a project that provides you with step-by-step instructions for building your own custom Linux system, entirely from source code."

pull_my_finger1y ago

Why though? It doesn't seem to be related at all to the OP other than both are tutorial books?

jsnnsjxj1y ago

This has nothing to do with the post?

jerjerjer1y ago· 1 in thread

I uh misread the title and thought someone built a C compiler in Scratch.

On topic, though: wouldn't a simpler language (maybe even a pseudo language) be a better target for a first learning compiler. I understand they don't build a full C compiler, but still. It looks to me like there's a lot of complexity add from choosing such a lofty target.

tuveson1y ago

What do you think would make a better target? C maps pretty closely to assembly, so it seems like it would be the simplest. Maybe Pascal or BASIC, but most people these days don’t have experience with Pascal, and BASIC would probably be too simple for a full-length book.

For writing an interpreter or transpiler, there are probably better options, but for a true compiler I can’t think of a better choice than C (or at least a subset of C).

fuhsnn1y ago· 1 in thread

chibicc[0] complement this book nicely, in addition to a basic compiler, it guides you through writing the preprocessor and driver, which, although not addressed much in literature, are the missing link between the compiler built from the book and real C projects.

[0] https://github.com/rui314/chibicc

markus_zhang1y ago

Thanks, I wish the companion book were ready!

synack1y ago

I’ve been working through this book implementing the compiler in Ada. So far, I’m really enjoying it. The book doesn’t make too many assumptions about implementation details, leaving you free to experiment and fill in the blanks yourself.

It feels like a more advanced version of Crafting Interpreters.

I haven’t looked at the OCaml implementation at all. The text and unit tests are all you need.

Discussion on the Ada Forum: https://forum.ada-lang.io/t/writing-a-c-compiler/1024

shoggouthOP1y ago

It also will be available via Amazon after August 20, 2024.

https://www.amazon.com/Writing-Compiler-Programming-Language...

tzs1y ago

I don't really need to know how to build a compiler, and I've got enough other "don't need but am doing out of curiosity" things going on that I don't need any more of those, but if it wasn't $70 I'd probably get it anyway. It would be interesting to compare to the last building a compiler book I read back and see how things have changed. Based on the comments here a lot has changed.

That last book was Allen Holub's "Compiler Design in C", which is from 1990. Here's how the blurb on the back describes it:

> Allen I. Holub's Compiler Design in C offers a comprehensive, new approach to compilers that proves to be more accessible to computer science students than the other strictly mathematical books.

> With this method in mind, the book features three major aspects:

> (1) The author develops fully functional versions of lex and yacc (tools available in the UNIX® operating system to write compilers), (2) he uses lex and yacc to develop a complete C compiler that includes parts of C that are normally left out of compiler design books (eg., the complete C "type" system, and structures), and (3) the version of yacc developed here improves on the UNIX version of yacc in two ways (error recovery and the parser, which automatically produces a window-oriented debugging environment in which the parse and value stacks are visible).

It's out of print, but the author has made a searchable PDF available on his website [1]. I found it quite useful.

Holub seems to like the "learn by doing" approach. He's got another book, "Holub on Patterns" that teaches all the design patterns from the gang of four book organically by developing two programs that together use all of those patterns. The two programs are an embedded SQL interpreter and a GUI application for Conway's Game of Life.

PS: Ooh. It occurred to me that No Starch Press books are often available on O'Reilly Learning. I checked and this one is there. So I guess it is going on my "don't need but am doing out of curiosity" pile after all.

[1] https://holub.com/compiler/

whartung1y ago

What approach does this book take to error recovery?

Several "compiler light" style articles and books kind of walk over that part, and it can be non-trivial to do properly, especially with modern expectations.

I remember way back in the day, one of the early C compilers for the PDP, and, honestly, it could almost be argued that ed(1) had better error messages than what that thing produced.

A lot of simple compilers hit an error and just give up.

So, just curious what the approach was in this book.

sunday_serif1y ago

I’m working through this book now and really enjoying it!

Each chapter of the book includes a test suite to run against the code you’ve written.

In some ways, the tests in this book feel very similar to the labs in the book Computer Systems: A programmers perspective — which is high praise!

sim7c001y ago

cool, remember some tutorials online i think from the same author (not 100% sure) doing stuff around c compilation in python. shame its not in a language i want to learn. the other book on compilers i got is almost to heavy to lift! :D

j / k navigate · click thread line to collapse

156 comments

86 comments · 22 top-level

hasbot1y ago· 11 in thread

jcranmer1y ago

There are actually quite a few changes!

The most obvious change you'll see is the use of SSA, which has become the dominant representation in IR starting 25-30 years ago.

There's also been an increase in the importance of compiler IRs, and especially the concept of code passing through multiple IRs before reaching machine code.

elteto1y ago

This is a fantastic comment, thanks.

late_again1y ago

trealira1y ago

https://news.ycombinator.com/item?id=40940799

> So what's different about writing a compiler in 2024 than say 10, 20, or 30 years ago?

pfdietz1y ago

Compiling today might be done at run time, exploiting dynamic information. The line between a compiler and an interpreter is blurred.

shortrounddev21y ago

mjburgess1y ago

kragen1y ago

and, while we're talking about ocaml, ocaml does use ocamllex and ocamlyacc for its own parser

2 more replies

thechao1y ago

When your computer was anemic, and could barely do the tasks required for it, eking out a few percent — or a 2x! — from an optimizer was important.

1 more reply

Croftengea1y ago

> Parser-generators were always academic projects

Were they? GCC abandoned bison in favour of their own parser relatively recently.

1 more reply

sestep1y ago

Could you give some specific examples of those new languages with their own backends for faster builds?

1 more reply

badsectoracula1y ago· 8 in thread

[0] actually from the readme in the github repo[1] it seems to be a C subset, not all of C

[1] https://github.com/nlsandler/nqcc2

kragen1y ago

ocaml makes writing a compiler enormously more accessible, and learning to read ocaml, while it can be somewhat intimidating at first, is much easier than learning to write a compiler

(imagine a medieval accountant trying to learn to do long division in roman numerals. he'll be much better off learning the western arabic numerals fibonacci is so excited about)

shortrounddev21y ago

1 more reply

norir1y ago

anta401y ago

Written in OCaml? Ahh interesting. Suddenly I remember similar book:

Modern Compiler Implementation in ML: https://www.cs.princeton.edu/~appel/modern/ml/

As an undergrad student, I think the C version is kinda easier to understand, though.

userbinator1y ago

The bonus of writing a C compiler in C is that you get to being able to experiment with self-compilation.

Croftengea1y ago

OCaml? Thanks for saving me a click!

hdbxbxndj1y ago

OCaml is one of the most used languages for compiler design

A good engineer should be able to use the right tool for the job

1 more reply

wang_li1y ago

A Retargetable C Compiler is another book that implements a C compiler in C.

https://www.amazon.com/Retargetable-Compiler-Design-Implemen...

quibono1y ago· 8 in thread

I swear I've seen this cover before... is this a new release or an updated edition of an older book?

halfcat1y ago

”Automate the Boring Stuff with Python” has a similar cover, by the same publisher.

jdnendm1y ago

Book is not yet published but in early access since a couple of years

Was featured here a couple of times.

Unfortunately the timing of the release is quite unfortunate with regards to the summer holidays. Will take a look at it next year

sgbeal1y ago

> Book is not yet published but in early access since a couple of years

According to the top post's link, it was released in July 2024.

1 more reply

byteplane1y ago

It’s actually out now, I have a copy! Ordered directly fro No Starch Press.

thejteam1y ago

There was a HN article about the same book about a month ago:

https://news.ycombinator.com/item?id=40940799

So maybe you saw it then.

orktes1y ago

Many compiler related books take inspiration from the "Dragon book" (Compilers: Principles, Techniques and Tools). So with likely lots of books with similar looking covers.

hdbxbxndj1y ago

The cover looks nothing like the dragon book however?

1 more reply

Almondsetat1y ago

I believe the author first started by making blog posts and then interrupted them to simply make a book about it

francogt1y ago· 6 in thread

markus_zhang1y ago

shawn_w1y ago

Presumably anything that walks the syntax tree.

1 more reply

CrimsonCape1y ago

Question for HN, pattern matching is defined as “runtime type/value checking”, is that correct?

Is duck typing the pseudo-unsafe alternative? (Not unsafe as in accessing unsafe memory, but as in throwing exceptions if the duck-typed function doesn’t exist on the current type)

Can C handle both?

Coming from a static type system like rust and c#, i’m doing alot of “if this is a duck, duck.quack()” and i’m looking for faster alternatives and less verbosity if possible

trealira1y ago

  data Tree a = Leaf a | Branch (Tree a) (Tree a)

  fromList :: [a] -> Tree [a]
  fromList = Leaf

  toList :: Tree [a] -> [a]
  toList (Leaf x) = x
  toList (Branch (Leaf x) r) = x ++ toList r
  toList (Branch (Branch l1 l2) r)
               = toList (Branch l1 (Branch l2 r))

  append :: Tree [a] -> Tree [a] -> Tree [a]
  append = Branch

Since a compiler is all about tree manipulation, support for tree pattern matching should be a boon.

[0]: http://h2.jaguarpaw.co.uk/posts/demystifying-dlist/

[1]: https://en.wikipedia.org/wiki/Joel_Spolsky#Schlemiel_the_Pai...

ashconnor1y ago

Useful list considering that feature: https://en.wikipedia.org/wiki/Category:Pattern_matching_prog...

songbird231y ago

I can implement it in rust?

alok-g1y ago· 5 in thread

I would love to see a book that talks about going all the way to generate machine code, i.e., not stopping at generation of assembly.

Alternatively, I would like to learn about not just how to make a compiler, but also simultaneously a debugger, hot-reloading, etc.

synack1y ago

The debugger book is coming soon. https://nostarch.com/building-a-debugger

alok-g1y ago

Awesome! Thanks.

hdbxbxndj1y ago

Writing an simple assembler is trivial. Even macro assemblers are very easy.

However, it's also boring.

Nevertheless the contents of the book cover all the techniques required to write an assembler, if you'd really like to

alok-g1y ago

In my case, I already know enough of the lexer, parser, etc., parts. What's missing is going all the way to making a debugger, profiler, etc.

3 more replies

peterfirefly1y ago

It's also "fun" if some instructions come in different sizes... and you may need stronger restrictions on allowed expressions in that case.

i_don_t_know1y ago· 5 in thread

Somewhat unrelated: Is there a book that walks you through building a database system from storage to queries, optimizer, execution, indexing, transactions, etc?

rednab1y ago

Database Design and Implementation, ISBN 3030338355 ¹). Java source code for the SimpleDB system from the book available from the author's website ²).

¹) https://www.amazon.com/dp/3030338355/

²) http://www.cs.bc.edu/~sciore/simpledb/

gtirloni1y ago

kragen1y ago

rramadass1y ago

In the early 90's Al Stevens wrote 2 books C Database Development and C++ Database Development with source code which might be a good starting point.

myth_drannon1y ago

Interesting suggestion! here is the book on archive.org: https://archive.org/details/cdatabasedevelop00stev/mode/2up

signaru1y ago· 3 in thread

wrycoder1y ago

Nand2Tetris is also like that - they provide an outline and tests, but you have to do the work. And, having the implementation language be different from the target language reduces confusion.

Plus, you get to become proficient in OCaml, which is a pretty good language.

kragen1y ago

myko1y ago

I thought this book looked neat but closed the tab before reading the comments here, and after this one decided to go ahead and buy it. Sounds really fun!

Coolbeanstoo1y ago· 3 in thread

spinningslate1y ago

It's been covered on several threads here over the years [1].

[0]: https://craftinginterpreters.com/ [1]: https://hn.algolia.com/?q=crafting+interpreters

jcpst1y ago

I remember seeing this a while back. That typesetting is beautiful. Thank you for bringing it up here, I might have to pick that one up.

I’ve been bored with building line-of-business applications, despite designing for complex requirements in high-volume distributed systems.

In fact I took a break from CS learning entirely 9 months ago. Even reading HN. I’ve been studying electronics and analog signal processing instead.

But now that I’ve built about 50 guitar pedals of increasing complexity, I feel ready to switch back to CS studies again.

agent2811y ago

sergius1y ago· 3 in thread

How does it compare with N.Wirth's?

https://onlinebooks.library.upenn.edu/webbin/book/lookupid?k...

cxr1y ago

Aside from that, I encourage everyone who cites Compiler Construction to actually work through the first 10% of the book and then count the number of errata.

hdbxbxndj1y ago

The book is a very hands on tutorial whereas Wirths is basic literature for the general case.

While they teach similar content, they have a different approach.

There are literally thousands of compiler design books out there, I don't really see anything particularly comparable between this book and Wirth's

anta401y ago

WalterBright1y ago· 2 in thread

I learned how to write a compiler by studying BYTE magazine in the 70's which published the source to a complete Pascal compiler as an article!

https://archive.org/details/byte-magazine-1978-09 (part 1)

All 3 parts of Tiny Pascal:

https://albillo.hpcalc.org/publications/Easter%20Egg%20-%20T...

barelyauser1y ago

The Byte magazine is incredible. First time reading it. The archive.org collection is a gold mine for learning. Thank you very much for posting it.

nj5rq1y ago

Thank you for sharing this, very useful. The BYTE magazine is absolutely amazing, it's a shame nothing similar could be done today.

carom1y ago· 2 in thread

ccmcarey1y ago

Blew it out of the water with more or less lines of code? :)

carom1y ago

Far fewer, to the point of another student asking me what I even did for the project because I didn't have to implement any of the algorithms.

the_panopticon1y ago· 2 in thread

In Ocaml, interesting. I was similarly surprised when I learned that the firs Rust compiler was written in Ocaml, too https://users.rust-lang.org/t/understanding-how-the-rust-com...

bunderbunder1y ago

ML (short for "meta-language") was originally designed for use in programming language research, and really shines for that purpose. And OCaml is probably the most pragmatic dialect for the purpose.

mananaysiempre1y ago

sylware1y ago· 2 in thread

I wonder why there is not the same book for c++... mmmmh... I really wonder... (irony).

sylware1y ago

It is because c++ has an absurdely and grotesquely massive and complex syntax (like rust...).

stevefolta1y ago

Yeah, Rust is the language for people who think C++ is not complex (or hostile) _enough_.

1 more reply

viraj_shah1y ago· 2 in thread

Dropping this one here! (no affiliation)

https://www.linuxfromscratch.org/

"Linux From Scratch (LFS) is a project that provides you with step-by-step instructions for building your own custom Linux system, entirely from source code."

pull_my_finger1y ago

Why though? It doesn't seem to be related at all to the OP other than both are tutorial books?

jsnnsjxj1y ago

This has nothing to do with the post?

jerjerjer1y ago· 1 in thread

I uh misread the title and thought someone built a C compiler in Scratch.

tuveson1y ago

For writing an interpreter or transpiler, there are probably better options, but for a true compiler I can’t think of a better choice than C (or at least a subset of C).

fuhsnn1y ago· 1 in thread

[0] https://github.com/rui314/chibicc

markus_zhang1y ago

Thanks, I wish the companion book were ready!

synack1y ago

It feels like a more advanced version of Crafting Interpreters.

I haven’t looked at the OCaml implementation at all. The text and unit tests are all you need.

Discussion on the Ada Forum: https://forum.ada-lang.io/t/writing-a-c-compiler/1024

shoggouthOP1y ago

It also will be available via Amazon after August 20, 2024.

https://www.amazon.com/Writing-Compiler-Programming-Language...

tzs1y ago

That last book was Allen Holub's "Compiler Design in C", which is from 1990. Here's how the blurb on the back describes it:

> Allen I. Holub's Compiler Design in C offers a comprehensive, new approach to compilers that proves to be more accessible to computer science students than the other strictly mathematical books.

> With this method in mind, the book features three major aspects:

It's out of print, but the author has made a searchable PDF available on his website [1]. I found it quite useful.

[1] https://holub.com/compiler/

whartung1y ago

What approach does this book take to error recovery?

Several "compiler light" style articles and books kind of walk over that part, and it can be non-trivial to do properly, especially with modern expectations.

I remember way back in the day, one of the early C compilers for the PDP, and, honestly, it could almost be argued that ed(1) had better error messages than what that thing produced.

A lot of simple compilers hit an error and just give up.

So, just curious what the approach was in this book.

sunday_serif1y ago

I’m working through this book now and really enjoying it!

Each chapter of the book includes a test suite to run against the code you’ve written.

In some ways, the tests in this book feel very similar to the labs in the book Computer Systems: A programmers perspective — which is high praise!

sim7c001y ago

j / k navigate · click thread line to collapse