> an s-expression [is] a fancy term for a list of [cons] cells
No, it's not. An S-expression is a serialization of a (single) cons cell, whose elements might be other cells.
> We're using a vector instead [of cons cells], but the two are equivalent
No, they are not. When represented as cons cells, CDR is a non-consing O(1) operation. When represented as vectors, CDR is a copying O(n) operation (and because it necessarily makes a copy, the semantics are different if you introduce mutation).
The fact that S-expressions represent cons cells and NOT vectors is crucially important. It is the feature from which Lisp derives much of its power.
It is possible to make a Lisp-like language where the surface syntax represents vectors instead of cons cells. In fact, this makes a useful exercise. If you undertake this exercise you will come to know the answer to the question: why has this idea (a vector-based Lisp) not gained more wide-spread adoption?
> There are three different sorts of data structure that we have so far: Strings. S-expressions, and Cells
As above, S-expressions are not a data type, they are a serialization of cons cells, i.e. they are a mapping between cons cells and strings, and this mapping has a very particular feature from which much of Lisp's power is derived, which is that it compresses a linked list of cons cells into this:
(1 2 3 4)
instead of this
(1 . (2 . (3 . (4 . nil))))
This language is missing that core feature. (It's also missing symbols, without which a language cannot rightfully call itself a Lisp. And first-class functions. And macros.)
While I don't want to discourage any attempt to make Lisp more accessible, it's important to actually get it right. What this article is describing is similar in spirit to Lisp, but it's not Lisp. It's a different (toy) language at a completely different point in the design space.
The misunderstanding, I think, comes from the fact that lisp code is typically quite terse, which is often a measure of simplicity. But part of the reason lisp code is so simple is that a lot of the heavy lifting is done behind the scenes. There's no getting around that software is complicated: the benefit of lisp is that it does a good job of managing the complicated parts while giving the programmer responsibility for a small subset of the functionality which is still powerful enough to enable the programmer to do what they need/want to do.
I have been programming in Lisp for a few years, and I think that Lisp programs are verbose, and not terse at all. This is because Lisp special forms and functions tend to be explicit and self-contained, which makes Lisp code more robust and easier to change and compose, than more terse notations that leave a lot of details implicit.
Non-trivial programs are just shorter to write in Lisp than many other languages, partly because the Lisp primitives are better thought out, and partly because that explicitness turns out to be more concise than working around the implicit special-case rules and poor error handling of other notations, that you end up needing to do in larger programs. Things that work for one-liners may not work as well once you get past one thousand lines.
You also see this in Common Lisp with DSLs like LOOP and FORMAT, which seem convenient until you run up against their limits (part of the reason why many people dislike them). The thing is that it is much easier to go from fully delimited/explicit to terse and implicit than vice-versa, if you have good primitives. Because of dynamic scoping and read macros, you can do really terse DSLs in Common Lisp in a very straightforward way (for example, Andrew Sengul did an APL subset that can be wrapped in a read macro, and I think sed would be pretty easy to do).
Heck, HN is powered by Arc which is (last I checked) running on Racket (or any other Lisp engine)!
We could also know the answer if you just told us. It's bad form to make readers jump through hoops for insights you could easily share. This is not school, you are not supposed to assign homework to make us learn.
It's gained incredibly widespread adoption. Clojure doesn't use cons cells at all; lists are trees of vectors internally, and vectors and maps enjoy first-class status, even as syntactic elements, right alongside lists. Clojure enjoys far more business interest than Common Lisp or any other Lisp dialect, meaning that non-cons-cell based Lisp has won.
Furthermore, the C++ programmers, being performance minded, have learned something the Lisp guys still don't seem to grok: on modern architectures where cache is fast and RAM is slow, the big-O inefficiency of using vectors vs. linked lists is usually more than made up for by the speed gains of cache locality, to the extent that for real-world data structures it almost never makes sense to use a linked list rather than a vector.
Anyway, if you represent a list as a slice of an underlying vector, cdr is still O(1).
In terms of usage, yes. Not in terms of adoption as an implementation technique. If you want a non-cons-cell-based Lisp (and no one actually wants this -- that code is vectors is not the reason people use Clojure) you have only one option.
> vectors and maps enjoy first-class status,
They do in Common Lisp as well.
> even as syntactic elements
Vectors have a standard surface syntax in Common Lisp, and it's trivial to define a reader macro for maps if you want one. But anything you would do with a static map is probably better done with a CASE form.
> the big-O inefficiency of using vectors vs. linked lists is usually more than made up for by the speed gains of cache locality
All of this only matters when processing code, which is usually a tiny fraction of the CPU time. When processing data, Lisp has vectors, and Lisp programmers can and do use them.
> if you represent a list as a slice of an underlying vector, cdr is still O(1).
Yes, but then you still have to allocate memory to store the result.
Contrary to your claim, that's nothing new. Memory hierarchies have existed decades ago, too.
Lisp users have found that linked lists as a primitive data structure were fast enough for many applications. Other native data structures like vectors, multi-dimensional arrays, hash-tables, etc. have been supported in Lisp for decades.
Just like there is no non-H2O water.
Well, there are isotopes: D2O is "heavy water". Similarly, not all Lisps have exactly the same cons representation down to the sub-atomic level; there are various isotopes.
The taxonomy isn't quite right for Lisp. It's actually simpler than the post. It should be something like this (very minimalistically):
<s-expression> ::= <atom> | ( <s-expression> . <s-expression> )
<atom> ::= <number> | <symbol> | <string>
...
The difference between this and what the author presents is that everything is an s-expression. s-expressions are either an atomic value or a pair of s-expressions. Convenient notation `(a b c ... z)` called "list builder notation" is provided so that we don't have to write `(a . (b . (c ... (Z . NIL))))`.I really wanted to show that core part of a programming language can be quite simple and nothing to be afraid of so everything else was subservient to getting that across :-)
A pair is just the most basic kind of compound data type. A triplet is just an atom plus a pair, and a quadruplet is an atom plus a triplet, and so forth. For any data structure containing N items, it can be represented as a sequence of N-1 cells.
It's possible to map this to custom data types. Say you wanted a type of `vec {x, y, z}`, then you can represent it as a list `(x (y (z . ())))` where you could define `get-x = car`, `get-y = cadr` and `get-z = caddr` to access the elements by name. Some lisps let you encapsulate the result such that only those functions can be used on the encapsulated list.
Rather then being more powerful to build everything into a language, it actually makes it less powerful when you try to do everything rather than providing the foundations on which other types can be built. See also: "Growing a language" by Guy Steele.
For instance, in common lisp there are procedures like `MAKE-ARRAY` and `MAKE-HASH-TABLE`. The interesting thing is that these hash tables and arrays are considered to be atomic.
Clojure generalizes the notion of pairs, which in some ways is more elegant (a collection of items being atomic isn't necessarily intuitive) and in some ways is less (now your language has to describe how this abstraction operates).
Especially cool how all evaluation is done by line 33. (Minor comment: Line 14 may benefit from using a different variable for the function instead of “c”.)
An “aside” comment, as the title mentions “literate C++” [Edit: This was submitted as something like “Build me a Lisp: in less than 50 lines of literate C++”]. This post exemplifies one aspect of literate programming (as Knuth envisioned it), which is “let us concentrate rather on explaining to human beings what we want a computer to do.”
But there are two further features that Knuth-style LP systems have, which are missing here:
• The ability to write your program / explanation in any order (ideally, an order that is best for presentation, rather than an order demanded by the compiler). In this case, the program is explained strictly in the order of lines in the program, which imposes some awkwardness. (Not only in the explanation, but also in the code: e.g. putting a #include on line 34 of 49.)
• Named chunks. So instead of writing
int main() {
try {
in one code block, and having to continue it later, you could write something like, say, int main() {
try {
[[say hello world]]
} [[handle errors]]
and later fill in the corresponding sections by name.The original examples by Knuth are somewhat hard to read because they were written in Pascal and the programs were quite low-level but there's a great example available today (a book which won an Academy Award!), now free online — http://www.pbr-book.org/ (random page: http://www.pbr-book.org/3ed-2018/Light_Transport_I_Surface_R... )
But then again, the “no-tools” approach here does have the advantage of not requiring any special build step and not requiring the reader to understand any such conventions, so I guess that's a tradeoff. (They say one of the reasons literate programming never took off is that there are as many literate-programming tools as the number of people trying literate programming; everyone ends up having to write the tool that works for them.)
The Babel tool in emacs org-mode looks like it does the real thing: https://orgmode.org/worg/org-contrib/babel/intro.html
Ultimately a program is a set of instructions to a computer, and when we as humans look at a program we're trying to conceptualize how these instructions are organized. (Essentially, “translate” them into a mental map.) We have so much more experience doing this with “real” code, knowing what we can ignore and when, and being able to do this at well — e.g. with things like function docstrings or public/private declarations, we're able to read them once and then have them basically fade away from our consciousness so we don't read them again. When faced with unfamiliar conventions, it takes a while to get that ability again.
(defvar foo 5) ; variable with the name foo
(defun foo () 5) ; function with the name foo
In a Lisp 1 the defun would overwrite the previous variable declaration. In a Lisp 2 though, both expressions coexist.https://groups.google.com/forum/#!msg/comp.lang.lisp/6TkfPLe...
Also as pointed out by KMP later in that thread, block and go tags are separate namespaces, and so are type names. I like to say that it is really Lisp1 vs LispN. Any language with macros and first-class identifiers is going to be capable of having arbitrarily many namespaces.
Is the “auto” in this case being filled in by some compiler-generated proxy type that can dispatch any common operation available across all variants?
It’s effectively a callable object with multiple overloads but I didn’t know lambda syntax could create something like that.
I expect we'll extend use of this sort of mechanism much further the more people understand it.
I'll watch that video, sounds interesting. A company I worked for nearly ended up getting a Symbolics LISP machine for CGI in the early 90s. Would have been pretty sweet I think.
I don't think of that as click-bait in the same way as "you won't believe #7", but rather as additional information which truthfully and transparently makes it more likely that I'll accurately gauge my interest in the linked article.
"Don't editorialize" in this case is a negative, but perhaps is globally optimal.
type cell = Atom of string
| Sexpr of cell list
let library = Hashtbl.create 1
let rec eval = function
| Atom s -> s
| Sexpr [] -> failwith "Empty sexpr"
| Sexpr (head :: args) ->
let fname = eval head in
match Hashtbl.find_opt library fname with
| Some builtin -> builtin args
| None -> failwith (String.concat " " [fname; " not found"])
let main () =
let prog = Sexpr [Atom "cat"; Atom "Hello "; Atom "world!"] in
let prog2 = Sexpr [Sexpr [Atom "cat"; Atom "c"; Atom "a"; Atom "t"];
Atom "Hello "; Atom "world!"] in
Hashtbl.add library "cat"
(fun args -> String.concat "" (List.map eval args));
Printf.printf "%s\n" (eval prog);
Printf.printf "%s\n" (eval prog2)
let _ = main ()
Using linked lists instead of vectors makes separating an application's head and args (CAR and CDR, if you prefer) much easier than using clunky C++ iterators. This is because linked list cells are composed of cons cells."Scheme 9 from empty space" -- https://t3x.org/s9fes/
It's a scheme built in c using literate programming. You can even buy a printed version.
Also, there's "Lisp in small pieces" if you want to go deeper -- https://books.google.de/books/about/Lisp_in_Small_Pieces.htm...
As a side note, I wrote a similar post in Swift: https://www.uraimo.com/2017/02/05/building-a-lisp-from-scrat...
And I can’t be bothered with C++ for compiler construction.