I can't really relate here. The parser is the easiest part of a compiler; the work only increases from there. I feel like if you ran out of steam at the parser, you never had enough steam to write a whole compiler. I don't think removing the parser will take you across the finish line if you otherwise were running out of steam.
My advice is to write your language in vertical slices. Write the parsing, semantic checking, and code generation for the simplest features first and progressively add feature slices, rather than trying to write the entire parser for a fully-baked language before proceeding. Consider including "print" as a built-in statement so you can print things out (and thus write tests) before you have working expressions and function calls.
Everyone's different. The problem with parsing isn't the difficulty, but rather the potential for endless bikeshedding. You're having to make a ton of opinionated decisions that in turn produce more questions about your syntax. If your personality is like mine in that it's a bit obsessive "completing" a phase, then parsing feels like an endless quagmire. In comparison, AST -> type inference -> codegen feels more structured and straightforward.
Considering the author is talking about an endless graveyard of abandoned projects it kind of sounds like ADHD. I have ADHD and have a similar problem.
I still do endless bikeshedding where I'll do things like insist on zero memory allocations or streams/iterators over slices and those arbitrary limitations will completely compromise the project. Either I never finish it because I can't get it to work, or I get it to work but it sucks because of stuff like "the amount of work to avoid memory allocation might be much larger than the work of just making a damn memory allocation".
I fail to see your point. OP was commenting on the technical difficulty of actually rolling out a parser, but you're not arguing the technical side. You're arguing the project management sides where "endless bike shedding" results in code churn. That's not the parser's fault, but the person's fault. You're bound to get hung up on details regardless of what part of the project you're working on, because it's not the parser compelling you to get hung up on details.
Thanks for chiming in!
But yeah, write the vertical slice and implement a print builtin are good tips.
I see programmers, even experienced ones, making this mistake all the time when implementing larger changes. They try to get the whole thing implemented at once even when it's obvious from the outset that there are multiple sub-features that can be more easily implemented separately, so one can focus all their efforts on it, make sure it's well tested and clean, before moving on to the next sub-feature.
For more complicated languages, there are tools like tree-sitter and ANTLR4. You can even extend an existing tree-sitter grammar to augment a language without having to re-write the base language parser.
If we are being honest here, most people just want their own better Python/Go/Rust. I doubt your average engineer is truly interested in the subtleties of compiler design and memory layout optimization. A single pass compiler, built like the olden day of yore, is the best option for most.
https://en.m.wikipedia.org/wiki/One-pass_compiler
Leave the 3 stage design for homework (or if you are being paid to do so).
I feel it's just that it is possible to say that work on the parser side is completed
meanwhile optimizations? you can probably endlessly improve stuff
I would absolutely recommend not supporting incremental compilation or error recovery in your first compiler. Just stop everything at the first error. Save that for your second hobby compiler, or better yet, the first commercial compiler that you get paid to work on.
On the other hand I do really hate the task of writing parsers, so I can relate to people who think it's the worst/most difficult part. But I think other parts are probably harder, like getting type system stuff right.
I really wish Python 3 had taken that advice. ;-)
(Esterel —IIRC— is the only language of which I'm aware that explicitly has two syntaxes, one traditionally parser based and one that, in principle, could be parserless)
Parsing something might be trivial but its still parsing
(in the examples above —which missed the prologs— I'm pretty sure the parsing is trivial on the order of "you can see everything that handles 'parsing' without needing to scroll a window", and in several of those examples it'd still be true even if your windows were only 25 lines long)
To me there's a clear analogy with startups. The naive conception of starting a company is that you get a pile of money so you can hire a bunch of people and create important infrastructure. But with startups, you're trying something new, so the most efficient use of time is to find the riskiest hypotheses and test them as directly as possible. That often involves doing things that seem wrong if you proceed in the "logical" way. E.g., I knew a successful UGC company that didn't implement accounts and logins until like 6 months in. But that was fine, because actual accounts were not needed to figure out whether the business worked.
I don't know where I heard this, but the idea is so important to me that I saved it on my blog: https://dorotac.eu/posts/in_the_middle/
For a programming language? Maybe if you are designing your language by feature list.
What if you are designing a programming language for ergonomics instead?
Let's be real - the differentiating factor in any modern language design is the syntax, not the features. They all mostly support a similar cross-section of features, in terms of "getting things done".
What you are really designing is a competitor to the existing languages, in which case it is beyond the scope and effort of a lone developer to match feature-for-feature of modern languages.
My experience of lone-wolf programming languages is all the same ... namely ...
Even if you do have one, single, differentiating feature, people aren't going to adopt unless you have all the other features they want. Doesn't matter how good your feature is if your language is missing some feature that people like in current mainstream languages.
You should also be careful of thinking that a single good feature will cause a little adoption; if it's any good the existing languages will simply adopt it!
Another path into darkness is thinking that the batteries included is so different to current offerings that people will adopt it, such as that recent post on HN about a language developed for cloud by the author of CDK. There's nothing in that language that can't be implemented as a library for existing languages.
For programming languages alone, going feature-first is a good way to produce an obscure language that no one is interested in. Without even a small community, the original dev themselves won't use it.
Where a new language makes sense is in ergonomics, not in features.
Can you make the syntax such that people onboard quickly? Can your syntax support something complicated in a manner that the most simple-minded developer can understand? Is your syntax amenable to collaboration? Can it be easily parsed in pieces for IDEs? Will the output be package-distributable or module-distributable? Can you ensure easy GDB integration? What build mechanism can be used (for reading the sources and figuring out dependencies).
Syntax is the major difference between writing in Kotlin and writing in Java.
My new language project, I'm still trying to nail down what the syntax should look like. I have no problem documenting the tree to support features I want, but I find that settling on what good syntax looks like to the majority of corporate developers is really really difficult.
Compare with, from an AST, emitting code for some advanced language constructs. That's almost a mechanical effort that I think I can delegate a lot of to ChatGPT.
Designing universally acceptable syntax, on the other hand, is a lot more complex and requires actual human decision-making.
I prefer the analogy of painting. You start with collecting references, exploring ideas in a sketchbook, make color tests, draw outlines on canvas, use big brushes for colors, refine with smaller and smaller details.
The problem is, that programming is all details / only details. There is no easy way to use big brushstrokes, so you have to improvise and not loose the overview. It doesn't help that engineers love details.
If you have an idea for a format or a language and would like to quickly start hacking on the layer above the syntax, Jevko is an option.
It's meant to be even simpler and hackable than S-expressions.
It gets you from a string to a tree in the least amount of steps.
See here[2] if interested.
Happy hacking!
[0] https://jevko.org/ [1] https://djedr.github.io/posts/jevko-2022-02-22.html [2] https://gist.github.com/djedr/151241f1a9a5bc627059dd9b23fc74...
I was more directly inspired by Lisps, but I do prefer the original M-expressions and the syntactic choices that REBOL and Red make.
I think placing the operator before the opening bracket better emphasizes its special significance and can reduce nesting for constructs like `f[x][y]` (vs. `((f x) y)` in Lisps). Square brackets somehow seem more aesthetically pleasing to me. And there is a practical reason to prefer them, especially if your syntax uses only one kind of brackets -- square brackets are the easiest to type on an average keyboard.
So REBOL-like syntax is nicer. As were M-expressions. They probably didn't catch on, because they were not minimal enough, compared to S-expressions. And maybe because S-expressions were fully implemented first.
Only to the extent that AST structure depends on syntax. For something like s-expressions defining new semantics never requires new syntax since arbitrary trees suffice to syntactically express any AST.
Well, 12 years later (ie. early 2023) I realized that I don't really have any kind of cool sideproject and started implementing the same idea in C (with the added goal of the VM being natively multithreaded with fine-grained locking along the lines of JikesRVM and WebKit). Well, I have stub implementations of classes needed for the AST representation and S-expression reader and writer…
Benefits of this approach include there being no such thing as a syntax ambiguity, language extensions being as easy to make as libraries, the language being easy to write by non-programmers, and a JetBrains IDE for your language - at the cost of not being able to use any other editor.
A great example of such a language is mbeddr[1].
[0]: https://www.jetbrains.com/mps/
[1]: http://mbeddr.com/
I had hope even though I knew what the deal with projectional was going in and indeed I was disappointed. The number of proprietary "dsl"s you have to grok to be proficient at using MPS is just mind-boggling and that's coming from someone that designs dsls for a living. Like by the time you get to codegen you're already 3 or 4 deep. And it's only for codegen, where you're generating Java, that you get the full projectional editor treatment. Everything else is just a gui form for some small "dsl" with effectively dropdowns, so you might as well just call it an API for a configuration system rather than a language. Like there's zero composition for the structure "language" and the editor "language". You're literally just toggling forms.
It's just a complete turn-off because even if it is "powerful" it's completely non-portable - you cannot ship anything without shipping MPS to orchestrate the cascade of dsls.
I tried to get mbeddr to work but could not even though I can drive gradle and etc fine.
Overall really disappointing.
Sorry if this sounds stupid or obvious, but with this kind of thing I find that it's easier to cross the finish line if you maintain humble goals. Focus on just getting a working end-to-end MVP. Refine and enhance it down the road; don't get stuck trying to make version 0 an awesomely praiseworthy effort.
Hoping to hear a positive update from you in the near future!
The other reason is that a language like Common Lisp is defined in terms of the data-structures used by the language and the language has no unique text representation: the “default” reader uses a slightly extended version of s-expressions, but any data structure in the language can be evaluated and any transform from text to data structures can be a textual syntax for Common Lisp.
Course page: https://www.lix.polytechnique.fr/Labo/Samuel.Mimram/teaching...
Lisp helps with this. You have the syntax settled, and can concentrate on the language from the get-go with all your steam.
Naturally there are limitations here. There's not much you can do to poke into the structure of Haskell functions other than evaluating them, so any sort of optimization and code gen work must happen by evaluating the function in Haskell, which gives you more of a challenge in designing the AST.
That said so far the author's language resembles simply typed lambda calculus and this is generally too simple. The meat of the language hasn't actually been designed yet in this post.
You could also try doing an embedded DSL in Haskell or even Kotlin if you just want to get your feet wet.
Having nice syntax is a good thing, yet no matter what the syntax is, it ends up as an AST. And the structure of AST records and the ways they can be combined matters quite a lot, I would say more than text syntax. XML makes it visible and easy to experiment with without getting distracted by text syntax niceties.