undefined | Better HN

story

0 pointssparkie12y ago0 comments

I suspect the parent thread was dreaming of a framework where languages with arbitrary syntax can be mixed and matched in user-defined ways.

Haskell isn't necessarily that language, partly because it still requires centralized coordination of development of these "extensions" to ensure they're interoperable - that is, there is only one parser for the language and its supported extensions, and many of them are build into the compiler rather than added as libraries, except those extensions which are done through quasi-quotation, such as with MetaHaskell, or some EDSL. Even that has it's own problems, and you'll have issues parsing if your quoted language happens to have delimiters which conflict with Haskell's quasiquoting delimiters `[| |]` - producing syntax which cannot be parsed unambiguously (perhaps very rare or unlikely though).

Perhaps the biggest hurdle of having a modular language is that we do not understand how unambiguously parse the combination of two or more syntaxes. We only know that composition of two CFGs results in another CFG, but with no guarantee of unambiguity, and other parsers such as PEG rely on ordered choice, where the computer can't decide which choice you really want.

What makes lisps great for composition of languages (or "EDSLs" in market speak), is that it bypasses the parsing problem by asking you to just write your language directly in terms of the syntax tree which a parser would generate - and perhaps use macros or other functions to simplify the use of that tree. Instead of a language being vocabulary+syntax, we create new vocabulary for what would be done through syntax in other languages - and we can thus refer to it unambiguously. Similar can be done in haskell too, through regular functions and quotation.

The parse problem is only really a problem because we're stuck with this silly model of "sequential text files" to describe code, and we're required to limit our languages such that a parser can take one of these text files and make sense of it. When we break out of this model, and use intelligent editors, we can reach the point where syntaxes can be composed arbitrarily, because we can indicate where each new syntax begins and ends. Diekmann and Tratt have demonstrated how this can be done while still appearing much like traditional text editing, which they call Language Boxes.[1][2]

Language Boxes only provide the means to compose syntaxes, but handling the semantic composition of languages is left to the authors of the languages which are being composed. Haskell is perhaps a good choice of language for providing the kind of glue needed here, where we can decide where languages can be composed based on the types returned by their parsers.

[1]:https://www.youtube.com/watch?v=LMzrTb22Ot8, [2]:http://lukasdiekmann.com/pubs/diekmann_tratt__parsing_compos...

0 comments

chc12y ago

I don't think the problem stops at syntax. It's possibly an even bigger issue that mixing different language semantics can be awkward. As a big obvious example, a language where all objects are nullable will interface awkwardly with one that only has option types. Similarly, interfacing with something like Smalltalk (which uses methods for flow control) or Forth (which…is Forth) would be awkward from a language that's more like C++.

Even in an environment like the JVM which specifies a lot of stuff for you, it's awkward to call into Clojure from Java because of the semantic differences.

sparkieOP12y ago

I wasn't implying there is no problem with the semantics, just that it's much easier to deal with when you already have the parsed trees, because they're easier to reason about with code - and we can project them unambiguously.

We already do write tools for such language interoperability for specific pairs of languages, which is often really awkward because it requires us to re-implement the parsers, and only deals with entire code files rather than specific productions in the syntax.

It's pointless composing languages unless it makes sense semantically, which would need to be done on a per-language basis (or per-production rule), which is where I was hinting with using Haskell as the glue for such interoperability - because if we encode the semantics into the type system, such that one syntax expects a language box of type T in it's grammar, then one should be able to use any other language whose parser returns a T, and the semantics will be well-defined for it.

It could also provide the glue for converting between nullable types and option types for example too, by requiring that a language returning a "Nullable T" be wrapped in some function "ToOption", which converts "Nullable T" into "Option T". Attempting to use the Nullable where an Option is expected would fail to parse. How ToOption is implemented is left to the author of the code.

It's much easier to have interoperability between individual production rules in different languages (which share many parts in common) versus "whole text files" which we currently have, which basically require the languages be almost equivalent to convert between them.

Also as a result of storing the semantic information as opposed to sequential text, it would be possible for the user to chose his preferred syntax for any semantic elements in the tree, since they're just working on a pretty-printed version. Most of the concerns about "code style" disappear because they're detatched from the actual meaning that is stored.

dmytrish12y ago

Yes, Haskell does not allow free syntax extensions composition as easily as Lisp does. But it gets right semantical compositions, using monads for encapsulating semantics and monad transformers to compose effects easily and in a controlled way. I think the latter is much more valuable.

j / k navigate · click thread line to collapse