I'm not a fan of parser generators. I find them restrictive. It is also notable most mainstream languages do not use tools like YACC, they have "handmade" parsers and compilers.
This may seem like I'm suggesting pie-in-the-sky approach from the ground up for something that may never need such control. Could be. But I don't feel like learning YACC is easier than learning how to parse anything in general. I think YACC and so on are holding people back from truly expressing themselves with custom languages, and after all that was the whole point of them writing a language or wasn't it?
Now, let's omit that for a moment. Let's say that after having implemented your language after years of work, now for some reason you got 1 prospect user.
That person will ask you about: syntax highlighting, linting, code navigation, testing, packaging, documentation, operating systems/architecture support, etc. Who is going to contribute all that if you are busy writing everything by hand?
Unless you have as much time as Terry Davis, you are better off starting by piggybacking on something that exists. Then, once you have an idea that scales, you can convince other people to help you and have a successful project. Then you can have a viable ecosystem that people can comfortably join.
Alternatively, you iterate using a cheaper method and once you reach a point in which you are content with the syntax, you can consider writing a compiler for it.
You can also avoid iterating on the syntax so you can ship a compiler as fast as possible, but users won't like the syntax and won't use your language.
- the `tokens` method returning a variable-length tuple is a very bad idea; modern Python supports the `datatype` decorator which should be used in this case
- I'd strongly recommend using a regex for lexing; much easier to get it right than a long if/elif/else block
- I'd recommend refactoring code and implementing the `next_token` logic so that it works for any generator; an example implementation could be (WARNING: I wrote this in 5min, would require extensive unittests!)
class Peek:
def __init__(self, it):
self.cur, self.nxt = itertools.tee(it, 2)
self.current = next(self.cur)
def __next__(self):
self.current = next(self.cur)
return next(self.nxt)
- statements with a "prefix" (like `print X` or `if Y`) are very easy to parse (just use the lookahead token!) but when you get to parsing expressions, I strongly recommend using a Pratt parser (extensible operator precedence parsing)Of course, the author can change the title on the website as well.
Curious if anyone has any additional thoughts to add onto this for anyone who wants to build one themselves?
I have experience in C++, so will likely use that or rust (I imagine rust’s built-in methods might make it way easier at least for the parser, although idk if ownership rules get painful for work like this).