A toy programming language in 137 lines of Python code (opens in new tab)

(blog.miguelgrinberg.com)

121 pointsmiguelgrinberg2y ago44 comments

44 comments

29 comments · 11 top-level

29athrowaway2y ago· 8 in thread

Once you have an intuition, you can use yacc.

Once you have an intuition, you just write the parser yourself.

I'm not a fan of parser generators. I find them restrictive. It is also notable most mainstream languages do not use tools like YACC, they have "handmade" parsers and compilers.

This may seem like I'm suggesting pie-in-the-sky approach from the ground up for something that may never need such control. Could be. But I don't feel like learning YACC is easier than learning how to parse anything in general. I think YACC and so on are holding people back from truly expressing themselves with custom languages, and after all that was the whole point of them writing a language or wasn't it?

29athrowaway2y ago

So let's say, you decide to create your own programming language, now you can automatically afford to do as much work as the organizations behind mainstream programming languages? Organizations with thousands of contributors, funding, etc.?

Now, let's omit that for a moment. Let's say that after having implemented your language after years of work, now for some reason you got 1 prospect user.

That person will ask you about: syntax highlighting, linting, code navigation, testing, packaging, documentation, operating systems/architecture support, etc. Who is going to contribute all that if you are busy writing everything by hand?

Unless you have as much time as Terry Davis, you are better off starting by piggybacking on something that exists. Then, once you have an idea that scales, you can convince other people to help you and have a successful project. Then you can have a viable ecosystem that people can comfortably join.

3 more replies

Tade02y ago

I'm not well versed in this space. Are you suggesting designing a language without specifying a formal grammar?

1 more reply

miguelgrinbergOP2y ago

Knowing how to build a parser manually will make you much much better at generating, and most importantly debugging, automatically generated parsers.

29athrowaway2y ago

Yes. And once you are done learning every aspect of it, one day you will look yourself in the mirror and you will be 60 years old.

3 more replies

laxd2y ago

And lose touch with your code and having too deal with problems through a heavily abstracted layer. Great!

29athrowaway2y ago

You will likely not get a perfect syntax right away. So you will need to iterate on it until you are happy with how it looks and feels, and doing it while developing the compiler yourself will naturally lead to a lot of unnecessary bugs and tech debt.

Alternatively, you iterate using a cheaper method and once you reach a point in which you are content with the syntax, you can consider writing a compiler for it.

You can also avoid iterating on the syntax so you can ship a compiler as fast as possible, but users won't like the syntax and won't use your language.

1 more reply

remexre2y ago

I don't really think of yacc as having much more abstraction than a regex engine -- like in an (NFA-based) regex engine, you're building a big data structure a human wouldn't want to hand-engineer from a declarative specification. (And, like in regexes, you've got good theory backing it.)

1 more reply

Fraterkes2y ago· 3 in thread

This seems great. When I read these kinds of explanations, it always strikes me how the way you're supposed to write a programming languages is basically the same way someone who had never written one might do it. Just break the input up into units, map the units to some category, and connect functionality to that. Like if you forced someone who had 2 weeks of programming experience to come up with a way to turn code into programs, they would probably think of something like the lexer/parser/evaluater process if you gave them enough time

lmm2y ago

You might enjoy https://hokstad.com/compiler . I could never get on with the traditional top-down presentation of building a compiler where you start with this idea that you're going to build a parser and a lexer and all that. Seeing someone build a compiler the same way you'd build a regular program - start from the simple cases and expand out to cover what you need to - was a real revelation. In practice it's ended up with something pretty similar to the standard architecture, but now I can understand why.

IIAOPSW2y ago

If you gave them enough time, they would have more than 2 weeks of programming experience.

bombolo2y ago

From personal experience… The stuff I came up before I studied compilers didn't resemble that structure at all.

tomp2y ago· 2 in thread

short code review:

- the `tokens` method returning a variable-length tuple is a very bad idea; modern Python supports the `datatype` decorator which should be used in this case

- I'd strongly recommend using a regex for lexing; much easier to get it right than a long if/elif/else block

- I'd recommend refactoring code and implementing the `next_token` logic so that it works for any generator; an example implementation could be (WARNING: I wrote this in 5min, would require extensive unittests!)

    class Peek:
      def __init__(self, it):
        self.cur, self.nxt = itertools.tee(it, 2)
        self.current = next(self.cur)
      def __next__(self):
        self.current = next(self.cur)
        return next(self.nxt)

- statements with a "prefix" (like `print X` or `if Y`) are very easy to parse (just use the lookahead token!) but when you get to parsing expressions, I strongly recommend using a Pratt parser (extensible operator precedence parsing)

benhoyt2y ago

> modern Python supports the `datatype` decorator

I presume you mean the @dataclass decorator?

tomp2y ago

indeed :D

gabrielsroka2y ago· 2 in thread

Actual title "Building a Toy Programming Language in Python"

codetrotter2y ago

I think using a different title on HN is fine when OP is the author, such as here

gabrielsroka2y ago

I noticed that after I posted it. But I still have a "web" button in my HN browser, and it doesn't work unless the title is correct.

Of course, the author can change the title on the website as well.

KAdot2y ago· 1 in thread

Great article. If someone is looking for a more advanced example, some time ago, as an exercise, I created an interpreter for a Python-like language in Python https://github.com/akrylysov/abrvalg

PixyMisa2y ago

Thank you for this. It's good to see examples that go beyond the toy stage but are still this readable.

intalentive2y ago· 1 in thread

I love Python but after taking PL in Racket (a Lisp-like language) it's hard to imagine implementing toy languages otherwise. Seems like Python 3.10's match statements might come in handy.

nighmi2y ago

The best Racket textbook (Beautiful Racket) is even focused around building new languages!

mr_00ff002y ago· 1 in thread

Great article. I’ve always wanted to try and build a toy language for the fun of it, great to see something that breaks it down so well.

Curious if anyone has any additional thoughts to add onto this for anyone who wants to build one themselves?

I have experience in C++, so will likely use that or rust (I imagine rust’s built-in methods might make it way easier at least for the parser, although idk if ownership rules get painful for work like this).

benhoyt2y ago

Yes, my thought is to read Bob Nystrom's "Crafting Interpreters". It's excellent, in-depth but not academic, and the writing style is great. https://craftinginterpreters.com/" He also has a lot of good thoughts about language design.

BoppreH2y ago

This is a really fun exercise, I recommend every programmer try it once. You can replace most of the tokenization code with re.Scanner[1], which also allows you to have strings without worrying about `code.split()` messing them up.

[1] https://news.ycombinator.com/item?id=36517749

kdamica2y ago

Grinberg is great. The flask mega tutorial taught me so much and I’ve recommended it so many times.

frgfm2y ago

Really great content, and an interesting exercise! A while back, I wanted to do something similar but built on top of a low-level language to get good performances on a specific use case. I'd love to see an article about something built on top of Rust or C (like a retrospective analysis on success/mistake of Python over the years)

kmstout2y ago

Ignoring all the work implementing a parser generator, here's a crappy language in fewer than 100 lines:

https://reindeereffect.github.io/2019/01/16/index.html

j / k navigate · click thread line to collapse

44 comments

29 comments · 11 top-level

29athrowaway2y ago· 8 in thread

Once you have an intuition, you can use yacc.

3cats-in-a-coat2y ago

Once you have an intuition, you just write the parser yourself.

I'm not a fan of parser generators. I find them restrictive. It is also notable most mainstream languages do not use tools like YACC, they have "handmade" parsers and compilers.

29athrowaway2y ago

Now, let's omit that for a moment. Let's say that after having implemented your language after years of work, now for some reason you got 1 prospect user.

3 more replies

Tade02y ago

I'm not well versed in this space. Are you suggesting designing a language without specifying a formal grammar?

1 more reply

miguelgrinbergOP2y ago

Knowing how to build a parser manually will make you much much better at generating, and most importantly debugging, automatically generated parsers.

29athrowaway2y ago

Yes. And once you are done learning every aspect of it, one day you will look yourself in the mirror and you will be 60 years old.

3 more replies

laxd2y ago

And lose touch with your code and having too deal with problems through a heavily abstracted layer. Great!

29athrowaway2y ago

Alternatively, you iterate using a cheaper method and once you reach a point in which you are content with the syntax, you can consider writing a compiler for it.

You can also avoid iterating on the syntax so you can ship a compiler as fast as possible, but users won't like the syntax and won't use your language.

1 more reply

remexre2y ago

1 more reply

Fraterkes2y ago· 3 in thread

lmm2y ago

IIAOPSW2y ago

If you gave them enough time, they would have more than 2 weeks of programming experience.

bombolo2y ago

From personal experience… The stuff I came up before I studied compilers didn't resemble that structure at all.

tomp2y ago· 2 in thread

short code review:

- the `tokens` method returning a variable-length tuple is a very bad idea; modern Python supports the `datatype` decorator which should be used in this case

- I'd strongly recommend using a regex for lexing; much easier to get it right than a long if/elif/else block

    class Peek:
      def __init__(self, it):
        self.cur, self.nxt = itertools.tee(it, 2)
        self.current = next(self.cur)
      def __next__(self):
        self.current = next(self.cur)
        return next(self.nxt)

benhoyt2y ago

> modern Python supports the `datatype` decorator

I presume you mean the @dataclass decorator?

tomp2y ago

indeed :D

gabrielsroka2y ago· 2 in thread

Actual title "Building a Toy Programming Language in Python"

codetrotter2y ago

I think using a different title on HN is fine when OP is the author, such as here

gabrielsroka2y ago

I noticed that after I posted it. But I still have a "web" button in my HN browser, and it doesn't work unless the title is correct.

Of course, the author can change the title on the website as well.

KAdot2y ago· 1 in thread

Great article. If someone is looking for a more advanced example, some time ago, as an exercise, I created an interpreter for a Python-like language in Python https://github.com/akrylysov/abrvalg

PixyMisa2y ago

Thank you for this. It's good to see examples that go beyond the toy stage but are still this readable.

intalentive2y ago· 1 in thread

I love Python but after taking PL in Racket (a Lisp-like language) it's hard to imagine implementing toy languages otherwise. Seems like Python 3.10's match statements might come in handy.

nighmi2y ago

The best Racket textbook (Beautiful Racket) is even focused around building new languages!

mr_00ff002y ago· 1 in thread

Great article. I’ve always wanted to try and build a toy language for the fun of it, great to see something that breaks it down so well.

Curious if anyone has any additional thoughts to add onto this for anyone who wants to build one themselves?

benhoyt2y ago

BoppreH2y ago

[1] https://news.ycombinator.com/item?id=36517749

kdamica2y ago

Grinberg is great. The flask mega tutorial taught me so much and I’ve recommended it so many times.

frgfm2y ago

kmstout2y ago

Ignoring all the work implementing a parser generator, here's a crappy language in fewer than 100 lines:

https://reindeereffect.github.io/2019/01/16/index.html

j / k navigate · click thread line to collapse