can you show me a parser generator that produces this kind of visualization?
Whenever my "compiler" found a syntax error in test suite, I was able to load part of source around error and investigate where my parser's error or omission is by running parser of smaller and smaller part of grammar on smaller and smaller parts of input.
It was 12 years ago.
And yes, it is fun. ;)
However (and this is just me talking), I don't see the point in a javascript-based compiler. Surely any file format/DSL/programming language you write will be parsed server-side?
JavaScript is a full programming language. Why wouldn't it be a fine choice to write a compiler in? People have a funny idea that compilers are more complex software or are somehow something low-level? In reality they're conceptually simple - as long as your language lets you write a function from one array of bytes to another array of bytes, then you can write a compiler in it. And for practicalities beyond that you just need basic records or objects or some other kind of structure, and you can have a pleasant experience writing a compiler.
> Surely any file format/DSL/programming language you write will be parsed server-side?
JavaScript can be used user-side, or anywhere else. It's just a regular programming language.
Typescript, sass, jsx... There are a lot of languages running on top of js. Or you might want to do colorizing, autoformating on input in the browser?
Along with all that, there's as mentioned nodejs, deno for running server side.
But at any rate - lots of front-end problems involve various kinds of parsing/validation and transformation (eg: processing.js).
Javascript doesn't seem suited to compiler construction because it lacks lots of features that make compiler construction pleasant (e.g. strong rich types, algebraic data types, etc.)
It might be "fine" but it's not "good".
“If I send someone an executable, they will never download it. If I send them a URL, they have no excuse.”
If someone interested in a compiler doesn't download it, it's not a excuse, it's a filter. Or a warning sign.
The choice of language often matters a lot less than how familiar you are with it (and its ecosystem(s)). I think it's totally reasonable to want to use JS for a compiler in, e.g., a Node project if for no other reason than to not have to learn too many extra things at once to be productive with the new tool.
I also don't think it's fair to assume everything will be parsed, tokenized, etc server-side. Even assuming that data originates server-side (since if it didn't you very well might have a compelling case for handling it client-side if for no other reason than latency), it's moderately popular nowadays to serve a basically static site describing a bunch of dynamic things for the frontend to do. Doing so can make it easier/cheaper to hit any given SLA at the cost of making your site unusable for underpowered clients and pushing those costs to your users, and that tradeoff isn't suitable everywhere, but it does exist.
It's interesting that you seem to implicitly assume the only reason somebody would choose JS is that they're writing frontend code. It's personally not my first choice for most things, but it's not too hard to imagine that some aspect of JS (e.g., npm) might make it a top contender for a particular project despite its other flaws and tradeoffs.
But I’m standing my ground because I’m not even writing a proper “compiler” - in my case, the output is JSON. So it just kinda feels like it makes sense to stick with JS.
(and you can always decide that you need more speed - if you have a grammar defined, it's almost trivial to feed it to some other parser-generator)
Well, Javascript has been used for over a decade heavily on the server side, with Node, WASM and other projects.
And as far as raw speed goes, something like v8 smokes all scripting languages bar maybe LuaJit.
So, there's that...
There is a world of difference in accessibility between a tool that requires installation and a tool that you can use by following a hyperlink.
My CC is Javascript based (well it was initially, then TypeScript, now a lot of it is written in itself).
99% of the time I use the actual languages I make in it server side (nodejs), but I am able to develop the languages in my browser using https://jtree.treenotation.org/designer/. It's super easy and fun (at least for me, UX sucks for most people at the moment). There's something somewhat magical about being able to tweak a language from my iPhone and then send the new lang to someone via text. (Warning: Designer is still hard to use and a big refresh is overdue).
It works great for our use-case though I have been eyeing tree-sitter[2] for its ability to do partial parses.
[1] USFM: https://ubsicap.github.io/usfm/ [2] https://tree-sitter.github.io/tree-sitter/
Don’t remember anything about office suite. Related names I remember are Alan Kay, Dan Amelang, Alessandro Wirth and Ian Piumarta.
https://en.m.wikipedia.org/wiki/Ometa (including reference section)
Or go to: http://www.vpri.org/writings.php
If I recall correctly you want: "STEPS Toward the Reinvention of Programming, 2012 Final Report Submitted to the National Science Foundation (NSF) October 2012" (and earlier reports)
Discussed on hn: https://news.ycombinator.com/item?id=11686325
And: https://news.ycombinator.com/item?id=585360
Notable for implementing tcp/ip by parsing the rfc.
"A Tiny TCP/IP Using Non-deterministic Parsing Principal Researcher: Ian Piumarta
For many reasons this has been on our list as a prime target for extreme reduction. (...) See Appendix E for a more complete explanation of how this “Tiny TCP” was realized in well under 200 lines of code, including the definitions of the languages for decoding header format and for controlling the flow of packets."
(...)
"Appendix E: Extended Example: A Tiny TCP/IP Done as a Parser (by Ian Piumarta) Elevating syntax to a 'first-class citizen' of the programmer's toolset suggests some unusually expres- sive alternatives to complex, repetitive, opaque and/or error-prone code. Network protocols are a per- fect example of the clumsiness of traditional programming languages obfuscating the simplicity of the protocols and the internal structure of the packets they exchange. We thought it would be instructive to see just how transparent we could make a simple TCP/IP implementation. Our first task is to describe the format of network packets. Perfectly good descriptions already exist in the various IETF Requests For Comments (RFCs) in the form of "ASCII-art diagrams". This form was probably chosen because the structure of a packet is immediately obvious just from glancing at the pictogram. For example:
+-------------+-------------+-------------------------+----------+----------------------------------------+
| 00 01 02 03 | 04 05 06 07 | 08 09 10 11 12 13 14 15 | 16 17 18 | 19 20 21 22 23 24 25 26 27 28 29 30 31 |
+-------------+-------------+-------------------------+----------+----------------------------------------+
| version | headerSize | typeOfService | length |
+-------------+-------------+-------------------------+----------+----------------------------------------+
| identification | flags | offset |
+---------------------------+-------------------------+----------+----------------------------------------+
| timeToLive | protocol | checksum |
+---------------------------+-------------------------+---------------------------------------------------+
| sourceAddress |
+---------------------------------------------------------------------------------------------------------+
| destinationAddress |
+---------------------------------------------------------------------------------------------------------+
If we teach our programming language to recognize pictograms as definitions of accessors for bit
fields within structures, our program is the clearest of its own meaning. The following expression cre-
ates an IS grammar that describes ASCII art diagrams."I was disappointed with how they do operator precedence; they use the usual trick to make a PEG do operator precedence which looks cool when you apply it to two levels of precedence but if tried to implement C or Python in it it gets unwieldy. Most of your AST winds up being nodes that exist just to force precedence in your grammar, working with the AST is a mess.
For all the horrors of the Bell C compilers, having an explicit numeric precedence for operators was a feature in yacc that newer parser gens often don't have.
I worked out the math and it is totally possible to add a stage that adds the nodes to a PEG to make numeric precedence work and also delete the fake nodes from the parsed AST. Unparsing I'm not so sure of, since if someone wrote
int a = (b + c);
how badly you want to keep the parens is up to you; a system like that MUST have an unparse-parse identity in terms of 'value of the expression', but for sw eng automation you want to keep the text of the source code as stable as you can.> You can use it to parse custom file formats or quickly build parsers, interpreters, and compilers for programming languages.
ident ::= name | name ("." name)+
Because with PEGs, the parser tries the first rule, then the second, and because whenever the second rule matches, the first one will also match, we will never parse the second rule. That's kinda annoying.Of course with PEG tools you could probably solve this by computing the first sets for both rules and noticing that they're the same. Hopefully that's what this tool does.
https://github.com/harc/ohm/commit/4611bf63c5ecb90d782112d68...
2014
Neat tool. I write parsers by hand though. More fun, and you can be a lot sleazier.
Now, ohm survives as an open-source project, Bret Victor continues work with Dynamicland and Vi Hart is currently employed at Microsoft Research.
We detached this subthread from https://news.ycombinator.com/item?id=26604134.
http://www.kylheku.com/cgit/txr/tree/share/txr/stdlib/optimi...
The type is fine whether or not the line is present. It's all about that invariant.
None of the hair pulling I've experienced in compiler debugging had anything even remotely to do with type, which is something flushed out by testing.
Whenever doing anything, like an optimization test case, I put in print statements during development to see that it's being called, and what it's doing. You'd never add a new case into a compiler that you never tested. Just from the sheer psychology of it: too much work goes into it to then not bother running it. Plus the curiosity of seeing how often the case happens over a corpus of code.
Help! That's what I did. I chose to write the compiler in OCaml, a language that's already ~30 years old by now. But I can not find any type anotations! What should I do? I'm stuck!
Lisp is one of the best compiler implementation languages. Doing the same in C of C++ is about 3-20x more effort.
There's nothing magical about Lisp that makes it super fit for compiler development.