Designing a Language Without a Parser (opens in new tab)

(thunderseethe.dev)

114 pointsthunderseethe2y ago127 comments

127 comments

55 comments · 17 top-level

electroly2y ago· 15 in thread

> However, once I start constructing a parser, progress slows to a crawl

I can't really relate here. The parser is the easiest part of a compiler; the work only increases from there. I feel like if you ran out of steam at the parser, you never had enough steam to write a whole compiler. I don't think removing the parser will take you across the finish line if you otherwise were running out of steam.

My advice is to write your language in vertical slices. Write the parsing, semantic checking, and code generation for the simplest features first and progressively add feature slices, rather than trying to write the entire parser for a fully-baked language before proceeding. Consider including "print" as a built-in statement so you can print things out (and thus write tests) before you have working expressions and function calls.

smasher1642y ago

> I don't think removing the parser will take you across the finish line if you otherwise were running out of steam.

Everyone's different. The problem with parsing isn't the difficulty, but rather the potential for endless bikeshedding. You're having to make a ton of opinionated decisions that in turn produce more questions about your syntax. If your personality is like mine in that it's a bit obsessive "completing" a phase, then parsing feels like an endless quagmire. In comparison, AST -> type inference -> codegen feels more structured and straightforward.

LoganDark2y ago

> The problem with parsing isn't the difficulty, but rather the potential for endless bikeshedding. You're having to make a ton of opinionated decisions that in turn produce more questions about your syntax.

Considering the author is talking about an endless graveyard of abandoned projects it kind of sounds like ADHD. I have ADHD and have a similar problem.

I still do endless bikeshedding where I'll do things like insist on zero memory allocations or streams/iterators over slices and those arbitrary limitations will completely compromise the project. Either I never finish it because I can't get it to work, or I get it to work but it sucks because of stuff like "the amount of work to avoid memory allocation might be much larger than the work of just making a damn memory allocation".

giovannibonetti2y ago

That explains why Lisp is so convenient for implementing DSLs. Homoiconicity skips all the bike shedding around syntax and parsing.

3 more replies

rewmie2y ago

I fail to see your point. OP was commenting on the technical difficulty of actually rolling out a parser, but you're not arguing the technical side. You're arguing the project management sides where "endless bike shedding" results in code churn. That's not the parser's fault, but the person's fault. You're bound to get hung up on details regardless of what part of the project you're working on, because it's not the parser compelling you to get hung up on details.

2 more replies

thunderseetheOP2y ago

This captures exactly the sentiment I was trying to express.

Thanks for chiming in!

danybittel2y ago

You need to better define the goal of your PL. If you have an exact vision of your end user and what she or he is going to use it for, it becomes easier to navigate any trade offs.

throwaway8943452y ago

I disagree. Parsers are tedious to write and obscure corner cases crop up over the course of development that make us completely change our approach to parsing or the grammar. You have to be pretty expert to get the grammar and parsing to work right the first time. Type inference is tough, but it’s not difficult the way writing a parser is. Codegen (suboptimal but functional) is even easier.

But yeah, write the vertical slice and implement a print builtin are good tips.

brabel2y ago

> rather than trying to write the entire parser for a fully-baked language before proceeding.

I see programmers, even experienced ones, making this mistake all the time when implementing larger changes. They try to get the whole thing implemented at once even when it's obvious from the outset that there are multiple sub-features that can be more easily implemented separately, so one can focus all their efforts on it, make sure it's well tested and clean, before moving on to the next sub-feature.

armchairhacker2y ago

My first language, I spent months on the parser and then gave up. As I learned more I got better until now, I can write a hand-rolled parser for a simple language in under a day. There's a technique to making syntax which is easy to read/write and easy to parse with a look-ahead parser.

For more complicated languages, there are tools like tree-sitter and ANTLR4. You can even extend an existing tree-sitter grammar to augment a language without having to re-write the base language parser.

KRAKRISMOTT2y ago

I have a better idea. Do the code gen while writing out the parser. It won't be the fastest language and the codebase would be a mess and you will most likely have correctness problems but it's the fastest way to production. The classic 3 stage compiler design (tokenizing/lexing/parsing -> AST -> optional lowering/bytecode etc. -> JIT/code gen) is only important if you want to semantic analysis and optimization. They are kinda redundant for prototyping (plus cranelift and LLVM exist, you can focus on the backend after you get your language design in order). Do a quick peephole optimization pass on the generated instructions if things get too slow.

If we are being honest here, most people just want their own better Python/Go/Rust. I doubt your average engineer is truly interested in the subtleties of compiler design and memory layout optimization. A single pass compiler, built like the olden day of yore, is the best option for most.

https://en.m.wikipedia.org/wiki/One-pass_compiler

Leave the 3 stage design for homework (or if you are being paid to do so).

tester7562y ago

Parser for "4-lulz language" or something that will also be used in IDEs, so has robust error recovery, can perform partial updates, etc?

I feel it's just that it is possible to say that work on the parser side is completed

meanwhile optimizations? you can probably endlessly improve stuff

electroly2y ago

Talking about the same thing as the article: hobby/learning languages. If you click the links to the author's projects you can see they have yet to finish a compiler or really even come close. Definitely not talking about sophisticated production-grade languages here; OP is trying to complete their first compiler at all. I think writing compilers is a really neat and useful learning project but you have to be smart about not biting off more than you can chew.

I would absolutely recommend not supporting incremental compilation or error recovery in your first compiler. Just stop everything at the first error. Save that for your second hobby compiler, or better yet, the first commercial compiler that you get paid to work on.

1 more reply

paulddraper2y ago

I think you're reaching higher

kevingadd2y ago

Yep, it was tremendously correct to start with (iirc) 'assert_eq' and 'print' as the two first statements in my language when building out the compiler. It meant I could start writing a test suite and assemble a larger and larger set of functionality as I went.

On the other hand I do really hate the task of writing parsers, so I can relate to people who think it's the worst/most difficult part. But I think other parts are probably harder, like getting type system stuff right.

musicale2y ago

> Consider including "print" as a built-in statement

I really wish Python 3 had taken that advice. ;-)

0823498723498722y ago· 5 in thread

While I agree designing syntax before you know your semantics is unwise, please consider also that there are several actually parserless languages extant: lisps, forths, APLs, smalltalks, various Edinburgh languages, etc.

(Esterel —IIRC— is the only language of which I'm aware that explicitly has two syntaxes, one traditionally parser based and one that, in principle, could be parserless)

zokier2y ago

I don't like the term parserless in this context. its not like you can just mmap an lisp source file and cast it to the Ast type from the article.

Parsing something might be trivial but its still parsing

mostlylurks2y ago

"Not parsed ahead of time" might be the better qualification. At least in some forths you can cease parsing the file (in the "outer interpreter") at any point and perform any kind of computation or IO that you've previously defined, and based on that do whatever you want with the rest of the text of the source file (or just the next couple of tokens if you want), including parsing it manually in some other manner than what the outer interpreter would do by default. I haven't gone that deep in lisp, but I hear reader macros allow something similar, though perhaps they might be more restrictive by requiring a transformation into valid lisp trees / values, whereas forth allows you to just do whatever, and if that happens to have the side effect of adding new functions to the dictionary, so be it.

0823498723498722y ago

in that case, pretend I wrote "grammarless"

(in the examples above —which missed the prologs— I'm pretty sure the parsing is trivial on the order of "you can see everything that handles 'parsing' without needing to scroll a window", and in several of those examples it'd still be true even if your windows were only 25 lines long)

2 more replies

jdougan2y ago

It wasn't compiled, but the way Smalltalk-72 did ?integrated into the rest of the execution? parsing is worth understanding.

http://worrydream.com/EarlyHistoryOfSmalltalk/

prmph2y ago

I had the same thought as the OP about a language I wanted to design to explore some concepts, and thought of using JSON as the syntax, with the grammar defined as a JSON schema.

wpietri2y ago· 4 in thread

Ooh, I like this. Too many people start projects at the logical beginning. But what you really want early on in a project is to maximize speed of exploration of the interesting parts.

To me there's a clear analogy with startups. The naive conception of starting a company is that you get a pile of money so you can hire a bunch of people and create important infrastructure. But with startups, you're trying something new, so the most efficient use of time is to find the riskiest hypotheses and test them as directly as possible. That often involves doing things that seem wrong if you proceed in the "logical" way. E.g., I knew a successful UGC company that didn't implement accounts and logins until like 6 months in. But that was fine, because actual accounts were not needed to figure out whether the business worked.

dcz_self2y ago

Start in the middle. It's the most interesting part, too. After all, that's the core of your idea.

I don't know where I heard this, but the idea is so important to me that I saved it on my blog: https://dorotac.eu/posts/in_the_middle/

mjcohen2y ago

That is also a good way to start an improv scene. Figure out later, if at all, what is really going on.

lelanthran2y ago

> Start in the middle. It's the most interesting part, too. After all, that's the core of your idea.

For a programming language? Maybe if you are designing your language by feature list.

What if you are designing a programming language for ergonomics instead?

Let's be real - the differentiating factor in any modern language design is the syntax, not the features. They all mostly support a similar cross-section of features, in terms of "getting things done".

What you are really designing is a competitor to the existing languages, in which case it is beyond the scope and effort of a lone developer to match feature-for-feature of modern languages.

My experience of lone-wolf programming languages is all the same ... namely ...

Even if you do have one, single, differentiating feature, people aren't going to adopt unless you have all the other features they want. Doesn't matter how good your feature is if your language is missing some feature that people like in current mainstream languages.

You should also be careful of thinking that a single good feature will cause a little adoption; if it's any good the existing languages will simply adopt it!

Another path into darkness is thinking that the batteries included is so different to current offerings that people will adopt it, such as that recent post on HN about a language developed for cloud by the author of CDK. There's nothing in that language that can't be implemented as a library for existing languages.

For programming languages alone, going feature-first is a good way to produce an obscure language that no one is interested in. Without even a small community, the original dev themselves won't use it.

Where a new language makes sense is in ergonomics, not in features.

Can you make the syntax such that people onboard quickly? Can your syntax support something complicated in a manner that the most simple-minded developer can understand? Is your syntax amenable to collaboration? Can it be easily parsed in pieces for IDEs? Will the output be package-distributable or module-distributable? Can you ensure easy GDB integration? What build mechanism can be used (for reading the sources and figuring out dependencies).

Syntax is the major difference between writing in Kotlin and writing in Java.

My new language project, I'm still trying to nail down what the syntax should look like. I have no problem documenting the tree to support features I want, but I find that settling on what good syntax looks like to the majority of corporate developers is really really difficult.

Compare with, from an AST, emitting code for some advanced language constructs. That's almost a mechanical effort that I think I can delegate a lot of to ChatGPT.

Designing universally acceptable syntax, on the other hand, is a lot more complex and requires actual human decision-making.

1 more reply

danybittel2y ago

That is sort of the idea of the MVP (minimum viable product).

I prefer the analogy of painting. You start with collecting references, exploring ideas in a sketchbook, make color tests, draw outlines on canvas, use big brushes for colors, refine with smaller and smaller details.

The problem is, that programming is all details / only details. There is no easy way to use big brushstrokes, so you have to improvise and not loose the overview. It doesn't help that engineers love details.

djedr2y ago· 4 in thread

This is one thing I designed Jevko[0][1] for.

If you have an idea for a format or a language and would like to quickly start hacking on the layer above the syntax, Jevko is an option.

It's meant to be even simpler and hackable than S-expressions.

It gets you from a string to a tree in the least amount of steps.

See here[2] if interested.

Happy hacking!

[0] https://jevko.org/ [1] https://djedr.github.io/posts/jevko-2022-02-22.html [2] https://gist.github.com/djedr/151241f1a9a5bc627059dd9b23fc74...

Jtsummers2y ago

With the square brackets it has a bit of a Rebol feel to it. Was that intentional or coincidental?

djedr2y ago

I suppose a bit of both.

I was more directly inspired by Lisps, but I do prefer the original M-expressions and the syntactic choices that REBOL and Red make.

I think placing the operator before the opening bracket better emphasizes its special significance and can reduce nesting for constructs like `f[x][y]` (vs. `((f x) y)` in Lisps). Square brackets somehow seem more aesthetically pleasing to me. And there is a practical reason to prefer them, especially if your syntax uses only one kind of brackets -- square brackets are the easiest to type on an average keyboard.

So REBOL-like syntax is nicer. As were M-expressions. They probably didn't catch on, because they were not minimal enough, compared to S-expressions. And maybe because S-expressions were fully implemented first.

marssaxman2y ago

Nice project. Thank you for sharing.

djedr2y ago

Thanks! :)

hardwaregeek2y ago· 3 in thread

One option you could also try is to use an existing language's syntax. Plenty of languages have high quality parser libraries by now, like swc/rome/acorn for JavaScript, rustc_parse for Rust, et. Of course syntax is influenced by semantics, so you'll end up wanting to remove or add syntax, but you could probably get decently far before that ends up a problem.

User232y ago

> Of course syntax is influenced by semantics

Only to the extent that AST structure depends on syntax. For something like s-expressions defining new semantics never requires new syntax since arbitrary trees suffice to syntactically express any AST.

dfox2y ago

One of my exploratory projects while I still though that academics career made sense involved replacing smalltalk's stack based bytecode with incrementally transformed S-expression trees. Naturally the first thing I went out to do was writing S-expression reader and writer in smalltalk, my advisor at the time told me that it is pointless busywork and I should just use ST literal syntax, it looked somewhat ugly with all the # there, but saved a lot of time.

Well, 12 years later (ie. early 2023) I realized that I don't really have any kind of cool sideproject and started implementing the same idea in C (with the added goal of the VM being natively multithreaded with fine-grained locking along the lines of JikesRVM and WebKit). Well, I have stub implementations of classes needed for the AST representation and S-expression reader and writer…

1 more reply

hardwaregeek2y ago

Sure, but if you're using an existing language grammar, it will almost certainly be influenced by the language's semantics, unless that language happens to be lisp (and even then most lisps are not pure s-expressions)

pie_flavor2y ago· 2 in thread

One great resource for making a language without a parser, not just delaying it but never writing it at all, is JetBrains' MPS[0]. It lets you make languages in a projectional editor: rather than a syntax parsed into an AST, you edit the AST directly in a series of GUI cells pretending to be an editor, much as in MS Word, so rather than writing a parser from syntax to AST, you write a renderer from AST to syntax.

Benefits of this approach include there being no such thing as a syntax ambiguity, language extensions being as easy to make as libraries, the language being easy to write by non-programmers, and a JetBrains IDE for your language - at the cost of not being able to use any other editor.

A great example of such a language is mbeddr[1].

[0]: https://www.jetbrains.com/mps/

[1]: http://mbeddr.com/

mathisfun1232y ago

Just last week I went through the calculator tutorial and I gotta say nah this ain't it.

I had hope even though I knew what the deal with projectional was going in and indeed I was disappointed. The number of proprietary "dsl"s you have to grok to be proficient at using MPS is just mind-boggling and that's coming from someone that designs dsls for a living. Like by the time you get to codegen you're already 3 or 4 deep. And it's only for codegen, where you're generating Java, that you get the full projectional editor treatment. Everything else is just a gui form for some small "dsl" with effectively dropdowns, so you might as well just call it an API for a configuration system rather than a language. Like there's zero composition for the structure "language" and the editor "language". You're literally just toggling forms.

It's just a complete turn-off because even if it is "powerful" it's completely non-portable - you cannot ship anything without shipping MPS to orchestrate the cascade of dsls.

I tried to get mbeddr to work but could not even though I can drive gradle and etc fine.

Overall really disappointing.

oaiey2y ago

Both comments are right. Projectional Editors are the thing to avoid parsers (instead of parsing you spend time in the editor though). And MPS is indeed a beast.

glonq2y ago· 2 in thread

> I can’t tell you why I keep returning to this venture when I’ve failed at it so many times.

Sorry if this sounds stupid or obvious, but with this kind of thing I find that it's easier to cross the finish line if you maintain humble goals. Focus on just getting a working end-to-end MVP. Refine and enhance it down the road; don't get stuck trying to make version 0 an awesomely praiseworthy effort.

thunderseetheOP2y ago

Absolutely! That's what I'm attempting this time. I'm hoping if I start with literally just the lambda calc + integer literals I can get something working e2e and layer stuff on top from there

glonq2y ago

If this was reddit, I would invoke a !remindme to check in on you in a month.

Hoping to hear a positive update from you in the near future!

kibwen2y ago· 2 in thread

This is one of the cases where you actually do want Lisp-style s-expressions, because they don't need any real parsing; functionally speaking, they are already the AST. (This is why you sometimes hear people saying that Lisp "doesn't have syntax".)

fiddlerwoaroof2y ago

> This is why you sometimes hear people saying that Lisp "doesn't have syntax".

The other reason is that a language like Common Lisp is defined in terms of the data-structures used by the language and the language has no unique text representation: the “default” reader uses a slightly extended version of s-expressions, but any data structure in the language can be evaluated and any transform from text to data structures can be a textual syntax for Common Lisp.

patrec2y ago

Exactly. And if you are paren-phobic, you could also have a look at Postscript, Forth or even Smalltalk/Self -- the last two are about the most minimal you can get with infix operators (but no precedence) and "keyword-arguments" (sort of), well, unless you want to go all-in on infix, and do APL.

zarathustreal2y ago· 1 in thread

Jtsummers2y ago

Also available for free from the author: https://www.lix.polytechnique.fr/Labo/Samuel.Mimram/teaching... [pdf]

Course page: https://www.lix.polytechnique.fr/Labo/Samuel.Mimram/teaching...

kazinator2y ago

> Without fail by the time I’ve managed to produce an Abstract Syntax Tree (AST) I’ve lost all steam.

Lisp helps with this. You have the syntax settled, and can concentrate on the language from the get-go with all your steam.

kccqzy2y ago

Here's an interesting anecdote in the same spirit of skipping uninteresting work. Back when I worked with a rather experienced Haskell developer, he also wanted to design a language but dreaded the part where a parser was needed. Besides that, he took the further shortcut of representing functions in his language using functions in the host language Haskell. That is to say, the AST for functions contains Haskell functions. This technique is called higher-order abstract syntax.

Naturally there are limitations here. There's not much you can do to poke into the structure of Haskell functions other than evaluating them, so any sort of optimization and code gen work must happen by evaluating the function in Haskell, which gives you more of a challenge in designing the AST.

That said so far the author's language resembles simply typed lambda calculus and this is generally too simple. The meat of the language hasn't actually been designed yet in this post.

porcoda2y ago

I like this. The focus on lexing/parsing for language implementation often overshadows the fact that the bulk of the work in any language effort is in the semantics and analysis (e.g., everything after one has a parse tree). On the production compiler I get paid to work on, front end work makes up at most 10% of the team time and effort. Even on experimental language projects that we occasionally play with, the front end is usually given minimal attention - just enough to have a syntax that we can use to start instantiating ASTs and doing the interesting work. More often than not, I punt on a front end altogether and just piggyback on some existing language (e.g., Haskell, ML, etc) and basically explore the semantics or analysis questions I'm interested in via a DSL-style embedding and ignore syntax altogether.

seanmcdirmid2y ago

The parser design space is really interesting, but recursive descent seems to be the best go to, and you don’t have to think about parsing theory much. Or you could go with no syntax (lisp or scheme style), or a structured language (but you’ll need to do your own editor for that).

You could also try doing an embedded DSL in Haskell or even Kotlin if you just want to get your feet wet.

carterschonwald2y ago

I always start with the AST and just work forward and backward from there.

Mikhail_Edoshin2y ago

I have been thinking for quite some time about using XML for this. It is pure AST you can relatively conveniently author as text.

Having nice syntax is a good thing, yet no matter what the syntax is, it ends up as an AST. And the structure of AST records and the ways they can be combined matters quite a lot, I would say more than text syntax. XML makes it visible and easy to experiment with without getting distracted by text syntax niceties.

Hardliner662y ago

I started something similar, but my goal is to create a projectional editor for the language, where syntax is more of a presentation problem than a parsing problem.

https://github.com/hardliner66/vormbaar/

ftomassetti2y ago

Honestly designing a parser is easy: just start using ANTLR and perhaps add later an AST layer. However if you do not want to go that I suggest looking in projectional editing, for example JetBrains MPS or Freon by Jos Warmer

j / k navigate · click thread line to collapse