undefined | Better HN

0 pointsjakear7y ago0 comments

Skimmed the article, left me a bit confused. Am I missing something big, or is this not particularly novel? The similarities between a-expressions and XML seem fairly obvious to me.

0 comments

7 comments · 2 top-level

txru7y ago· 3 in thread

Well, it is and isn't novel. S-expressions have been around since McCarthy, XML since the mid-90's. The point of the article is that XML is a more verbose re-interpretation of s-expressions-- it's hard to find things that XML brings to the table that sexps don't have. What's more, inside editors, there are really clever things that manipulate sexprs, move them around, redefine their semantic meaning. XML usually doesn't work quite that way, not as a first intent.

tannhaeuser7y ago

SGML has been around since the 1960s, and XML is specified as a proper subset of it. SGML/XML isn't so much about (trivial) nesting than it is about content models, eg. the language defined by a regular expression admitted/recognized as the content of a particular element. Markup is also first and foremost a plain text format, optionally tagged by start-/end-element tags, unlike sexprs which need quotes around individual spans of text. Try telling an author to use verbose quoting (and escaping for quotes) for what makes the majority of his text format, or try edititing a large text with verbose sexpr yourself, and you'll see why nobody uses sexprs for semistructured text.

txru7y ago

I take your point, and I really don't want to be the spark of a syntactic flamewar. I suspected I was missing something, and that the $angle_bracketed_format was older, but I was searching the wrong things.

It seems to me, though, that escaping is just something that's going to be tricky everywhere, and a decent first line solution, wherever you are, is to have a really rare set of characters represent your begin/end string marks. In Python, """text""", Postgres has $$text$$, non-ASCII characters in other formats. XML and sexps are both susceptible to that issue-- both of their escapes are, themselves, escapable. To either one, if you have a subregion that's likely to be unintentionally escaped, then you create a boundary where you either explicitly escape every one, or you refuse to acknowledge previously accepted delimiters. As an example, lisps have (quote term) rather than 'term when you're writing macros and concerned with macro-expansion.

To your regex point, there are lisps that definitely did awful deeds with that, particularly emacs lisp, but the more recent ones have solutions just like other modern programming languages and markups do.

To me the unending escape just kind of seems like a universal bug. While lisp is just as susceptible, lisps have perfectly reasonable ways of treating these problems-- separate, make distinct, and as last resort, escape.

goto117y ago

SGML came out of document authoring and publishing. SGML is more suited for this domain because (among other things) you don't have to quote every single string. It is not like the SGML community didn't know about s-expressions - DSSSL the style language for SGML was based on scheme, so they recognized s-expressions were appropriate for some domains.

neilv7y ago· 2 in thread

It's obvious to any Lisp person, and very convenient, in some wasys.

A few things usually going on with S-expression representation of XML or HTML encoding in a Lisp:

1. It uses some of the native basic types of Lisp -- the list, the symbol, and the string.

2. The HTML element values that you type in your source and that are displayed to you are generally in the same syntax, since that's how Lisps tend to work with the basic types. (For contrast, you don't type your HTML like `<html><body><p>Hi</p></body></html>` in your source, and then see it in the debugger like `HtmlElement#abcd1234("html", {HtmlElement#c948f447("body", {HtmlElement#e7e7e7e7("p", {HtmlCdata#c8c8c8c8c8("Hi")})})})`.

3. The S-expression printed representation you see in your source can be less verbose than HTML or XML, such as by not needing HTML element end tags. Though you will have to put your HTML CDATA text as quoted string literals.

4. A Lisp person's typical code indenting (supported by the editor), tends to expose the tree/forest structure of HTML conveniently:

    (html (head (title "My Page"))
          (body (p "First paragraph.")
                (blockquote "Don't quote me on that.")
                (div (p "Another paragraph.")
                     (p "Yet another paragraph."))
                (p "Hey, it's a paragraph.")))

Note that I probably wouldn't type a huge book this way. I might instead use Markdown or a DSL or alternate reader, such as Scribble or its at-reader, mainly to get TeX-like paragraphs: https://docs.racket-lang.org/scribble/ https://docs.racket-lang.org/scribble/reader-internals.html

tannhaeuser7y ago

> The S-expression printed representation you see in your source can be less verbose than HTML, such as by not needing HTML element end tags

Tag inference/omission in SGML (and by extension HTML when seen as an application of SGML) is way more powerful. A minimal, valid HTML document is this:

    <title>Whatever title</title>
    <p>Text goes here

SGML's tag inference, when coupled with a DTD for HTML5 such as mine [1], will treat that as equivalent to this:

    <html>
      <head>
        <title>Whatever title</title>
      </head>
      <body>
        <p>Text goes here</p>
      </body>
    </html>

See details in slides or paper linked from [2].

[1]: http://sgmljs.net/docs/w3c-html51-dtd.html

[2]: http://sgmljs.net/blog/blog1701.html

neilv7y ago

Neat that you've done this for HTML5. Early Web browsers tended to do some of that (with less-rigorous semantics, some of which I approximated in my early HTML parser). Consequently, Web pages in practice often did, too. Maybe that could make manual writing of documents in HTML5 more practical. For bits of HTML embedded in code lately, I've preferred a simpler model and syntax using S-expressions, but I could see the implied tags as very useful for handwriting documents SGML-style.

j / k navigate · click thread line to collapse

0 comments

7 comments · 2 top-level

txru7y ago· 3 in thread

tannhaeuser7y ago

txru7y ago

goto117y ago

neilv7y ago· 2 in thread

It's obvious to any Lisp person, and very convenient, in some wasys.

A few things usually going on with S-expression representation of XML or HTML encoding in a Lisp:

1. It uses some of the native basic types of Lisp -- the list, the symbol, and the string.

4. A Lisp person's typical code indenting (supported by the editor), tends to expose the tree/forest structure of HTML conveniently:

    (html (head (title "My Page"))
          (body (p "First paragraph.")
                (blockquote "Don't quote me on that.")
                (div (p "Another paragraph.")
                     (p "Yet another paragraph."))
                (p "Hey, it's a paragraph.")))

tannhaeuser7y ago

> The S-expression printed representation you see in your source can be less verbose than HTML, such as by not needing HTML element end tags

Tag inference/omission in SGML (and by extension HTML when seen as an application of SGML) is way more powerful. A minimal, valid HTML document is this:

    <title>Whatever title</title>
    <p>Text goes here

SGML's tag inference, when coupled with a DTD for HTML5 such as mine [1], will treat that as equivalent to this:

    <html>
      <head>
        <title>Whatever title</title>
      </head>
      <body>
        <p>Text goes here</p>
      </body>
    </html>

See details in slides or paper linked from [2].

[1]: http://sgmljs.net/docs/w3c-html51-dtd.html

[2]: http://sgmljs.net/blog/blog1701.html

neilv7y ago

j / k navigate · click thread line to collapse