Source Code Typography (opens in new tab)

(naildrivin5.com)

69 pointsdavetron500013y ago65 comments

65 comments

63 comments · 26 top-level

Fa773NM0nK13y ago· 7 in thread

"occasional performance concerns require putting readability in the backseat, but this is rare"

occasional? Seriously? rare? Seriously?

"Source code should be written to be understood by people." Nope, Source code should be written to be executed. If people can understand it easily, its a plus point, not a baseline.

Considering that the keyboard is the primary way we write sources, I find it difficult enough to keep my fingers in speed with my thoughts. In addition to that if I have to press tabs to align each of the statements in my 'for's, I'll be left in a much poorer way.

vec13y ago

One of my professors in college put it in a way that stuck with me. All source code has to be compiled by (at least) two different machines; the compiler and the programmer's brain. The machines have radically different parsers, but must end up with the same parse tree for the program to function correctly.

kruhft13y ago

> "Source code should be written to be understood by people."

This viewpoint is described well in the Preface to the First Edition of Structure and Interpretation of Computer programs, 2nd paragraph:

http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-7.html

I would suggest reading the whole Preface. It's quite an inspirational work for this field.

Fa773NM0nK13y ago

"The source of the exhilaration associated with computer programming is the continual unfolding within the mind and on the computer of mechanisms expressed as programs and the explosion of perception they generate. If art interprets our dreams, the computer executes them in the guise of programs!"

Enjoyed it! Thanks!

kruhft13y ago

I apologize, the Foreward is the inspirational work I was speaking of:

http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-5.html

Not that the contents of the Preface should be ignored :)

Fa773NM0nK13y ago

Cool! I'll surely give it a read!

JungleGymSam13y ago

I'm a layman so I ask this question with sincerity. Would your opinion change if the formatting was performed manually, at the moment of your choosing?

Fa773NM0nK13y ago

"formatting was performed manually"? What do you mean by that?

I'd rather it be performed automatically. But, I find very few tools that would format it the way I want.

1 more reply

tmoertel13y ago· 4 in thread

Typography is a subtle and tricky thing. What seems “better” may actually provide misleading cues.

For example, consider the alignment in the following, improved snippet from the blog post:

    var x        = shape.left(),
        y        = shape.right(),
        numSides = shape.sides();

It may be “better” typographically, but it also suggests a false parallelism.

The eye can't help but interpret closely packed things as groups. So the subliminal cue presented by the formatting above is that there is a parallel assignment from the group of expressions on the right to the group of variables on the left. That is, at some level the eye can't help but see the code above as

                   { e }
         { v }     { x }
    var  { a }  =  { p } ;
         { r }     { r }
         { s }     { s }

But the evaluation and assignment are not parallel! They are sequential. The difference may not matter in this example, but it's easy to imagine this kind of formatting applied to examples where it would.

The less-formatted original version below actually represents the reality more faithfully because its shape does not suggest parallelism:

    var x = shape.left(),
        y = shape.right(),
        numSides = shape.sides();

It looks more like a sequence, which it is.

Indeed, typography is a subtle and tricky thing.

prawks13y ago

Typography doesn't address /what/ should be written but rather /how/ it should be presented to make what was written as readable as possible.

I think this hints at what you're saying, though the author's presentation style may imply that his is the way formatting needs to be done.

"What was written" is analogous to "what the author meant". If they author makes the assumption that those operations may be executed in parallel, irregardless of order, then the exampled typesetting may be appropriate. "Better" typography shouldn't imply that it looks pretty, as in artistic. It should be more readable, without getting in the way of conveying the author's intent. It should aid the reader in reading, and in reading, arriving at the author's meaning.

As with anything, I don't think blindly adhering to example is a good idea; this case is no different.

You're dead-on that typography is subtle and tricky!

tmoertel13y ago

The point I was trying to make, however, is that since the example code is JavaScript, the reality is that a statement like that will always result in sequential evaluation and binding and, therefore, a typographical treatment that suggests otherwise will provide misleading cues always.

The author's intent doesn't matter at that point. The typographical treatment has already reached the reader's eyeballs.

If the language were Haskell, however, the align-on-equals treatment would actually communicate the truth. For example, the following code means exactly what its visual presentation suggests it means (see [1]):

    let x      = y + 1
        y      = foo 0
        foo i  = max 0 (i - 1)
    in ...

So this treatment adds clarity, not takes it away.

[1] http://www.haskell.org/onlinereport/exps.html#sect3.12

astrodust13y ago

The kind of people that line up their assignments and other syntax elements of the same sort are the ones that prefer justified text, even in inappropriate cases. It's annoying.

You also end up with "floaters", where if in this case `numSides` is removed, x and y assignments will have a needless number of spaces. These can be corrected, but you'll also inherit "blame" for the change, which is misinformation.

Keep them tight, learn to read code that way. Code is not ASCII art.

Luyt13y ago

Also, when you introduce a new variable in such a block of initialisers, and it is longer than the other variables, you have to pad all the other declarations with spaces just to keep them aligned... what a waste of time. Apart from that, it also generates 'false' diffs with source code control systems which are configured to be sensitive to whitespace.

georgef13y ago· 4 in thread

When I first started programming in C, I did the same thing with pointers (i.e.,

    char* str;

instead of

    char *str;

Unfortunately, this creates the wrong impression that

    char* str1, str2;

creates two pointer-to-char variables, whereas actually str1 is a char pointer and str2 is simply a char. Indeed, I was confused on this point myself when I was a newbie, which led to great confusion later on.

The clearest way I've ever found to think about C declarations (ironically, I think I read this in some article maligning C's syntax in favor of Go's), is that each declaration is of the format

    [type] [expressions--one for each new variable--that equate to type];

Thus, the way I think of declaring a char pointer is

    char [de-referencing the variable (which is a char pointer) to arrive at the char];
    char *str;

(Obviously, not everyone is going to agree that that is simple, but it works for me and my brain.)

Anyway, the point is that while typography is great, it can be just as harmful as helpful if you communicate the wrong impression to the reader. And to be fair, it really looks like the author is not at home in C: besides his misconception about pointer declaration, he didn't bat an eye at the old-style argument-declaration syntax that has been obsolete since ANSI C.

Also, I hope I never write a for loop that looks so massively bloated--a matter of opinion I guess.

dsego13y ago

What is wrong with having str1 and str2 declared on separate lines? Like this:

  char* str1;
  char* str2;

What is gained with combining these declarations (except for less typing)?

georgef13y ago

Less typing is exactly the benefit, and with it less chance of error because of less repeated code. It is also another way of showing parallelism in your code.

And again, the wrong impression is conveyed (that you are declaring a variable of type char* instead of a pointer type that dereferences to a char).

With more complex types, not understanding what is really going on makes code incredibly opaque (and that typographical style ad hoc). What would you do with this:

    char *((*func)(char *));

That isn't a char pointer at all. It's a pointer to a function that takes a char pointer as an argument and returns (i.e., evaluates to) a char pointer. It looks incredibly dense (to me, at least), unless I think of it in the way I talked about above (in which case it all makes sense and is kind of cool).

With the convention used in the article, the declaration would be something like this:

    char* ((* func)(char*));

Which doesn't make clear why the outside parentheses are needed, or why there is a dereference (or multiplication?) operator in front of the variable name.

The idea of a variable declaration being an expression that evaluates to a basic type is actually the reason why the same symbol * is used in the declaration and in the dereferencing of pointers: they mean the same thing.

Basically, what I am trying to say is that the typographical practice used in the article is at odds with the actual meaning of the statement (and the explanation he gives shows he clearly does misunderstand the statement), which can only lead to confusion in the long run, especially when you encounter code written by other people. Or, as with your example, it can lead to eschewing a useful feature of a language simply because it doesn't look pretty according to your arbitrary whitespace conventions.

limmeau13y ago

Grouping comes to mind, e.g.

        int x, y;
        int width, height;
        int area;
        int numPoints;

_mpu13y ago

Less typing, less lines, more information on the screen and the readability is not impaired.

davidw13y ago· 4 in thread

On a bit of a related subject one thing I've always idly wondered about Lisp and typography is how hard the curvy parens, together with not much indentation make it hard to line stuff up vertically.

A random elisp example:

    (while (> count 0)
      (re-search-backward regexp bound)
      (when (and (> (point) (point-min))
                 (save-excursion (backward-char) (looking-at "/[/*]")))
        (forward-char))
      (setq parse (parse-partial-sexp saved-point (point)))
      (cond ((nth 3 parse)
             (re-search-backward
              (concat "\\([^\\]\\|^\\)" (string (nth 3 parse))) 
              (save-excursion (beginning-of-line) (point)) t))
            ((nth 7 parse) 
             (goto-char (nth 8 parse)))
            ((or (nth 4 parse)
                 (and (eq (char-before) ?/) (eq (char-after) ?*)))
             (re-search-backward "/\\*"))
            (t
             (setq count (1- count))))))

See where the ends of the parenthesis point? Not straight up or down, but diagonally.

Just a random thought... maybe it's just me.

akkartik13y ago

Yeah, I agree. Lisp is best with small functional blocks.

"It used to be thought that you could judge someone's character by looking at the shape of his head. Whether or not this is true of people, it is generally true of Lisp programs. Functional programs have a different shape from imperative ones. The structure in a functional program comes entirely from the composition of arguments within expressions, and since arguments are indented, functional code will show more variation in indentation. Functional code looks fluid on the page; imperative code looks solid and blockish, like Basic." -- On Lisp (http://www.paulgraham.com/onlisptext.html; page 30)

As function size grows, it doesn't look quite so pretty. An example from news.arc:

  (if no.user
      (submit-login-warning url title showtext text)
     (~and (or blank.url valid-url.url)
           ~blank.title)
      (submit-page user url title showtext text retry*)
     (len> title title-limit*)
      (submit-page user url title showtext text toolong*)
     (and blank.url blank.text)
      (let dummy 34
        (submit-page user url title showtext text bothblank*))
     (let site sitename.url
       (or big-spamsites*.site recent-spam.site))
      (msgpage user spammage*)
     (oversubmitting user ip 'story url)
      (msgpage user toofast*)
     (let s (create-story url process-title.title text user ip)
       (story-ban-test user s ip url)
       (when ignored.user (kill s 'ignored))
       (submit-item user s)
       (maybe-ban-ip s)
       "newest"))

There's a multi-branch if here, with lots of else ifs. But it's hard to see. In my toy dialect[1], I've added[2] colons as syntactic sugar that expands to nothing:

  (if no.user
      : (submit-login-warning url title showtext text)
     (~and (or blank.url valid-url.url)
           ~blank.title)
      : (submit-page user url title showtext text retry*)
     (len> title title-limit*)
      : (submit-page user url title showtext text toolong*)
     (and blank.url blank.text)
      : (let dummy 34
          (submit-page user url title showtext text bothblank*))
     (let site sitename.url
       (or big-spamsites*.site recent-spam.site))
      : (msgpage user spammage*)
     (oversubmitting user ip 'story url)
      : (msgpage user toofast*)
     : (let s (create-story url process-title.title text user ip)
         (story-ban-test user s ip url)
         (when ignored.user (kill s 'ignored))
         (submit-item user s)
         (maybe-ban-ip s)
         "newest"))

[1] https://github.com/akkartik/wart#readme

[2] http://arclanguage.org/item?id=16495

limmeau13y ago

Racket (and maybe other Schemes) lets you interchange [] and () freely. Helps a bit. e.g.

         (cond
           [(positive? -5) (error "doesn't get here")]
           [(zero? -5) (error "doesn't get here, either")]
           [(positive? 5) 'here])

1 more reply

brudgers13y ago

There are a couple of things I see which influence the typographic decisions regarding the code. First is its size - not really big enough to justify a data abstraction for the strings forming regex's. Being standalone, there is a justification for inlining these that would not be there for a similar snippit from part of a larger system.

A second feature appears to be optimizing the layout to display in "forty lines." There are places where the lines could be shorter but aren't. Not abstracting the strings falls somewhat into this category.

Finally, the code snippet does not appear to be the output of a pretty printer. Instead the typography is based on considerations beyond readability.

prawks13y ago

I think it's an easy thing to get used to, especially if the parentheses are a bit thicker. They're just a more slender character than a curly brace, so they don't appear to take up as much space as the others even in a monospaced font.

dsego13y ago· 4 in thread

Reading Code Complete (and listening to Crockford's talks) has really opened my mind to writing clearer code constructs. For example, the for-loop's job in the first example should be to track indexes. There shouldn't be code that "does stuff" between parens. Instead of superficially breaking the for-loop into several lines and wasting time on aligning semicolons, it could be re-written as a while loop with clarity in mind. Like this:

  while (*from != 0) {
    *to = *from;
    from++
    to++;
  }

To me this looks much saner (unless I'm doing something wrong, I'm a bit rusty on pointers). But I notice that a lot of "C-hackers" try to cram as much into one line as they can, often including every possible pointer incrementation and assignment. At least here, the variables are properly named. A lot of C code uses one-letter variables and reading those isn't a lot of fun, e.g.

  while((*t++ = *f++) != 0 )

(Note, I don't really know if this is correct.)

oneeyedpigeon13y ago

Yes, a for loop is ridiculous for a string copy; K&R do it like so:

  while (*t++ = *f++)
      ;

The pointless comparison to 0 is removed, and the semi-colon is required to terminate the statement. Of course, the point about this version is, once you understand it, it's trivial to recognise. Parsing the 'full' version, especially as formatted in the article, takes a lot longer because it's not familiar and contains more parts to read, any of which could deviate from what might be expected. The other positive is much more code is visible at once.

dsego13y ago

I guess your example is so idiomatic that it's easy to recognise what it does at a glance. It's good to use easily recognisable patterns. The problem is that I really feel that pointer arithmetic should be separate, because in more complicated code the above style might lead to off-by-one errors that are hard to detect. Also, I don't think the comparison with 0 is pointless. I feel that you should only remove the comparison when an expression gives a boolean result. One other problem here is that, at least for me, it seems safer to only write termination conditions inside parens and the copying code in the loop's body.

zb13y ago

Maybe the reason "C-hackers" do it that way is because it actually works, while your example completely fails to null-terminate the destination string.

dsego13y ago

Good point. I guess it gets a bit more complicated when you try to do the right thing. So I'll re-examine the simple while loop like oneeyedpigeon posted:

  while (*t++ = *f++)
      ;

This code really packs a lot of punch. The value of f is copied to t and then both pointers are incremented. If the value is 0 the loop is terminated. So there's always at least one character copied.

In my initial rash response the loop would exit without copying the 0. So to fix it I might just add a new line

  *to = 0;

(if I'm not mistaken the pointer is already incremented when the loop exits).

Another option would be a while loop with a break statement, It looks weird, but does express the correct intent, which is "continuously copy from source string and exit if you've reached the end":

  while (true) {

    *to = *from;
    if (*to == 0) break;

    from++
    to++;

  }

jmoiron13y ago· 3 in thread

Unfortunately if you focus on beauty you sometimes break pragmatism. The JavaScript example in particular is not merely a typographical convention, but a way to avoid common errors.

    var i=1
      , j=2
      , k=3

You can remove any of the comma prefixed lines there (even the last one) and not introduce an error. You can add another similarly prefixed line anywhere to the list and not introduce an error. It's obvious if a comma is missing (which is good, because you don't have a compiler to let you know).

It can be difficult to spot the lack of a trailing comma, or the end of this declaration list having a comma instead of a semicolon (, vs ;), both of which will break the execution of your script.

So please, do not change your code to make it look better without understanding why it's like that in the first place.

The classic example is tchanging the following to allman/gnu style braces would break it in JavaScript:

    // works, returns {a:1,b:2}
    return {
        a: 1,
        b: 2
    }

    // semicolon inserted after return, returns undefined
    return
    {
        a: 1,
        b: 2
    }

spartango13y ago

Not too lean too heavily on the crutch of tooling, but it bothers me that this type of error can "slip through the cracks". Our tooling should make it patently obvious that this is a problem (before the code can be tested) if not automatically fixing it.

Indeed many tools (IDEs) do correct these kinds of problems, and it strikes me as silly that we have to worry about the execution of programs failing because of these types of typos/bugs.

ricardobeat13y ago

The article also states that a comma is required between variable declarations is one of the least important pieces of information in this code.

That is wrong. The comma is not incidental, it is an operator that tells you the next declaration is locally scoped. A missing comma changes the result of all following assignments.

rflrob13y ago

I don't know JavaScript, but I found it easy to interpret the commas as ditto marks, essentially reminding you that there's an implicit var before each declaration. Even without understanding the original reasons behind the convention, I'm not convinced the OP made an improvement.

LowKarmaAccount13y ago· 2 in thread

The elephant in the room is that many languages have adopted or have been influenced by the C language's miserable practices of ending a statement with a semicolon, and using the equals sign as an assignment operator. Both of these practices break with conventional usages to no discernible advantage.

Another impediment to readability is the insistence upon representing code by using hideous, low contrast colors on a dark background, especially when the code snippets are mixed with the conventional representation of black text on a white background.

My favorite quote about C's synatx was written by Erik Naggum: "If you care to know my opinion, I think semicolon-and-braces-oriented syntaxes suck and that it is a very, very bad idea to use them at all. It is far easier to write a parser for a syntax with the Lisp nature in any language than it is to write a parser for thet stupid semiconcoction. Whoever decided to use the semicolon to _end_ something should just be taken out and have his colon semified. (At least COBOL and SQL managed to use a period.)"

yareally13y ago

> The elephant in the room is that many languages have adopted or have been influenced by the C language's miserable practices of ending a statement with a semicolon, and using the equals sign as an assignment operator. Both of these practices break with conventional usages to no discernible advantage.

The convention of = stems from mathematics and using it to assign input to variables there. Algebra probably being the first to use it about 1000 years ago.

wirrbel13y ago

Assignments != Equation

Assignments are pretty unmathematical since they represent memory operations and (usually) allow for mutation.

Wirth chose := in Pascal for a reason (with the comparison operator begin = instead of == there which is at least closer to the mathematical = operator).

andrus13y ago· 2 in thread

Would like to know whether the author feels his rewrite is more successful than the original. The following takes me longer to read:

    for (
                              ;
            (*to = *from) != 0;
            ++from, ++to
        )
        ;

Whereas the idiomatic version seems simpler:

    for (; (*to = *from) != 0; ++from, ++to);

_mpu13y ago

I agree. I don't understand why people keep trying to deface C. This "indentation" scheme is by all mean ridiculous and not practical. One thing we learn from this article is that the author actually does not program in C.

Luyt13y ago

When I saw that ridiculous formatted for() in the article, I began suspecting the article was meant as a joke.

jwarren13y ago· 2 in thread

I come from a photography and design background, and I've tended to naturally write code much like the author is suggesting.

However, the for loop is mystifying to me, but I don't fully understand the actual code there. Would anyone care to explain it?

maxerickson13y ago

Well, I guess you expect that it is copying something. It is implemented using C pointers, which enjoy a reputation of being difficult to explain. For example:

http://stackoverflow.com/questions/15151377/what-exactly-is-...

Accepting some level of fuzziness, the ++ parts are moving through two spots in memory, the * parts are copying the contents of the one to the other. It all goes 1 byte at a time.

(I apologize if my assumption that you are not familiar with C is off base)

gknoy13y ago

    for (; (*to = *from) != 0; ++from, ++to)

to and from are pointers to characters. When we increment them, we point to the next memory location in each string.

    "hello"  // a sequence of bytes with a zero marking the end 
     ^-- to

In our loop, we don't need to allocate any helper variables, and can just increment our pointers to point to the next memory slot in each string. I'm not a C expert, so I may get this subtly wrong, but I believe this is equivalent to:

    do {
        *to = *from;    // copy character from source to destination
    } while (0 != *to); // until we reach the '\0' terminator

chris_wot13y ago· 1 in thread

I positively hate that for loop rewrite. Having the semi-colon so expressly indented looks awful, especially on it's own line.

I really feel that the for loop should be converted to a while loop - that would probably make the intent clearer.

homosaur13y ago

You unintentionally point toward another good reason for strict code formatting, which is if it promotes the actual structure of the code, sometimes it makes code smell very obvious. I'm in agreement with your assessment, which would not have been as obvious without the hideous formatting.

breadbox13y ago· 1 in thread

The C example is a great demonstration that typography is not an objective practice. That someone could take the original strcpy() and, with the express goal of improving its appearance, produce something so unpleasant to read ...

When I was younger I did a lot more interior alignment between lines, like with the list of variable declarations. Over the years I've found that they add a lot of busywork effort to the editing process. Everyone talks about optimizing for reading, but not optimizing for editing, which in some cases is really what you need to be optimizing for. But even laying that aside, the readability improvement of such spacing was debatable. Sometimes it's making visible a repetition (presumably one that couldn't be done with an actual loop), but a lot of times it feel more like a novelist who decided that the main verbs of their sentences should be vertically aligned. Agreed, it's making something visible -- but is that really what the average reader cares about? Ultimately the typography should not be a distraction from discerning the sense of a piece of code.

jmhain13y ago

No need to pick between optimization for reading and editing. You can have both by using tools like gofmt and astyle.

ebbv13y ago· 1 in thread

Given that we can't get developers to agree on where to place { I highly doubt we'd ever be able to settle on more esoteric formatting issues.

I'm also not sure that I want a developer spending much of his brain space or time prettying up the code beyond what is currently considered well formatted code. While it might be nice, there's also probably better things she could be working on.

dsego13y ago

You should read Code Complete 2. I was surprised but there are actually reasonable arguments against placing the opening brace on a new line. I don't have the book with me now, but I can look it up later if you're interested.

danbmil9913y ago· 1 in thread

I've always thought source code should use different fonts, and perhaps even non-monospace fonts in some cases (perhaps for strings, comments)

Why are we forced to stick with a single fixed-width font and color, limited use of italics, and no use of boldface?

Luyt13y ago

"Why are we forced to stick with a single fixed-width font"

Not necessarily ;-) Since proportional fonts became available in programmers' editors, I've been using them. See http://www.michielovertoom.com/incoming/desksnap-20101013.pn... for an example.

Sublime also allows proportional fonts, whereas TextMate does not.

gdubs13y ago· 1 in thread

Personally, I've always found code with a "table structure" prettier but less readable in practice. It's also fussy and time-consuming to maintain. I favor information density and flow.

numbsafari13y ago

As with what @Flenser said above, formatting code with "table" structure can result in a non-whitespace change on one line causing you to reformat the whitespace on unrelated lines, thus muddying your commit log.

PeterisP13y ago

The suggested transformations look nice for the trivial, tiny examples used, but would really hurt readability of code in general.

Code is not read linearly as a book - it is 'scanned' and reviewed back-and forth; and the compactness of the code is important for readability.

Also, on currently standard(sadly) computers, especially laptops, vertical space is very restricted in standard LCD dimensions. If you spread out a screen of semanically linked code to two screens, then you suddenly can't grasp it all at once w/o scrolling through the pages back and forth, and that is a real loss. Newlines and empty lines can and must be used to group things in "paragraphs", but the OP suggestions waste far too many lines.

thedufer13y ago

While I like the idea of talking about typography in code, I don't think it's likely that we're as badly off as the post suggests. In fact, pretty much all of those examples make it harder for me to read.

>We’ve also re-set the data type such that there is no space between char and * - the data type of both of these variables is “pointer to char”

No. No no no. That is how you end up with people who can't parse `char* a, b;`. What you're doing is declaring that `* to` and `* from` are `char`s. The common reading that the post suggests is wrong and detrimental to reading C properly.

kstenerud13y ago

I'm having trouble deciding whether this is a serious post or a troll. The C "improvements" are particularly hideous.

orillian13y ago

Looking at this thread is why we need tools that let us structure code the way "WE" like it, but that outputs said code in a defaulted format.

Basically IDE formatting should be outside of the actual code formatting allowing people to format code how they see fit. Some are going to like to see the code the "standards" way and that's fine; Many of us will not.

@tmoertel's point is a prime example of this, as the only relationship I personally see in the aligned code is that they are all variables. His perceived parallelism was not present in my personal view of that same code snippet. Now for me white-space denotes parallelism more poignantly than alignment.

  var  x                  = shape.left(),
         y                  = shape.right(),
                                                           <--**
         numSides = shape.sides();

meh, I give up trying to format this code block. :P
*This denotes parallels between variables to me. Not the alignment.

The only parallel that is draw for me is that all the items are variables (this would also be reinforced by the color I give variables). I group things with white-space; so that means in the context of the function using the variables as I've laid them out above; x,y are related and most likely linked, where numSides is needed but not necessarily associated inside the function with the x,y variables.

So ya, this stuff is very subjective and as @tmoertel said "subtle and tricky".

Flenser13y ago

If you also consider what commit diffs will look like if you have to insert or delete a variable, (particularly the last one) then comma first creates less cruft in your history.

wirrbel13y ago

I think looking at musical typesetting is even closer to the problem of typesetting source code. lilypond.org does explain some ideas about musical typesetting. For example when and when not to align notes, etc.

The OP does seem to favour a table-like grid layout for source code, yet I rather feel like source code describes hierarchical trees (that is why we like the indenting). Arranging things in a tabular way is not principally beneficial. Sometimes you encounter things like this:

    int                x =       get_width();
    const long double  y = 1.5 * get_width();

But what use is the aligning? Type, identifier and value of `x` are far apart, it is easy to switch lines here.

Now most programming languages have a big problem with indentation and layout because their syntax is weird. C's habit of putting types in front, etc. That is probably why the GNU C styleguide is proposing something like this:

    int
    strcpy( ....

Type and identifier on separate lines. This style is not widely adopted, and that probably shows another aspect of typography: What is typographically correct depends on what is common.

_m_a_u_r_i_c_e_13y ago

A comma maybe is not so important in a language between humans like here --->, but in a programming language (as human-human AND human-computer medium) a comma is often an important separator. Missing it breaks often correctness, therefore having it first in the line based on importance is ok for me. No syntax compile tool needed to visualize its ok...

zwieback13y ago

I think more important than a particular typographic style is consistency. Most of us move between large codebases that are formatted differently and I find that I can quickly adapt to a different style as long as it's consistent.

Also, I think color is really useful and something that's not used as much in traditional typography.

jamesaguilar13y ago

It's better to just use an autoformatter like clang-format. The miniscule time you'll save reading the code with clever formatting is going to be wasted in arguments with fellow contributors about how clever the formatting is and whether it's appropriate.

Chris_Newton13y ago

We could definitely do a better job with source code typography, but if the goal is to make the code more readable, I think the points discussed here are setting the bar pretty low for what we could achieve instead of today’s norms.

For example, even fairly basic textbooks make extensive use of footnotes, endnotes, marginal notes, sidebars, illustrations, tabular material, bibliographies, and other similar tools. What these all have in common is that they remove valuable supporting information from the main text, but keep it available (with varying degrees of immediacy) for readers who want to refer to it, and provide conventions to help the reader find it. In some cases, they also present that material in a structured way that is more effective than plain text.

With modern interactive IDEs, we have the ability to use not only all the same tricks that traditional book typesetting does but also many more because ours can be dynamic, interactive, graphical, or any combination of the above. We can add supporting material on any of four sides around a code listing, or even overlay information on the listing itself, and we can change the information we show in those places automatically with context and/or at the reader’s request based on what they want to do next. And yet, except for debugging and for navigating around large projects, we tend to make very little use of these tools. We still mostly try to present code as a single static column of monospaced material with the occasional extra vertical space and a bit of horizontal alignment to emphasize structure, and maybe a left sidebar with line numbers and a couple of icons for bookmarks or breakpoints, or perhaps a bit of trivial dynamic highlighting during things like find/replace work.

What if instead we tried to make the main code area focus on the core logic and data, and move anything else out of the way? There are many opportunities to do that. Type annotations? Supporting data. Comments? Supporting data. Long list of module.names.before.useful.identifier? Probably supporting data as long as it’s unambiguous, probably something you want to draw attention to if it’s not. Even keywords like ‘var’ in the example code snippets don’t help a human reader much. And that’s just with the kind of conventions and languages we use today, without even considering the endless possibilities of languages and tools designed with alternative presentation in mind from the start.

None of this even needs to get in the way of the tried and tested practice of storing code in plain text files. It could all be dealt with in your editor/IDE using exactly the same kinds of techniques as the standardised-formatting tools other posters have mentioned here, and saving a file could record the supporting data in a standardised text format that is friendly to version control systems, code review tools, automated diffs, and so on.

In short, if we’re going to address readability and presentation in programming, let’s think outside the box a bit. We have the most powerful information sharing and presentation tools in human history at our disposal, and a mountain of data about usability and interface design. We can do more than worrying about how many spaces we should have between tab stops.

1 more reply

egypturnash13y ago

There's a big ol' elephant in the middle of the room that this post is not addressing.

The irrational love programmers have for horrible monospace fonts.

jimhefferon13y ago

My eyes aren't great, but I had trouble reading the code.

j / k navigate · click thread line to collapse

65 comments

63 comments · 26 top-level

Fa773NM0nK13y ago· 7 in thread

"occasional performance concerns require putting readability in the backseat, but this is rare"

occasional? Seriously? rare? Seriously?

"Source code should be written to be understood by people." Nope, Source code should be written to be executed. If people can understand it easily, its a plus point, not a baseline.

vec13y ago

kruhft13y ago

> "Source code should be written to be understood by people."

This viewpoint is described well in the Preface to the First Edition of Structure and Interpretation of Computer programs, 2nd paragraph:

http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-7.html

I would suggest reading the whole Preface. It's quite an inspirational work for this field.

Fa773NM0nK13y ago

Enjoyed it! Thanks!

kruhft13y ago

I apologize, the Foreward is the inspirational work I was speaking of:

http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-5.html

Not that the contents of the Preface should be ignored :)

Fa773NM0nK13y ago

Cool! I'll surely give it a read!

JungleGymSam13y ago

I'm a layman so I ask this question with sincerity. Would your opinion change if the formatting was performed manually, at the moment of your choosing?

Fa773NM0nK13y ago

"formatting was performed manually"? What do you mean by that?

I'd rather it be performed automatically. But, I find very few tools that would format it the way I want.

1 more reply

tmoertel13y ago· 4 in thread

Typography is a subtle and tricky thing. What seems “better” may actually provide misleading cues.

For example, consider the alignment in the following, improved snippet from the blog post:

    var x        = shape.left(),
        y        = shape.right(),
        numSides = shape.sides();

It may be “better” typographically, but it also suggests a false parallelism.

                   { e }
         { v }     { x }
    var  { a }  =  { p } ;
         { r }     { r }
         { s }     { s }

The less-formatted original version below actually represents the reality more faithfully because its shape does not suggest parallelism:

    var x = shape.left(),
        y = shape.right(),
        numSides = shape.sides();

It looks more like a sequence, which it is.

Indeed, typography is a subtle and tricky thing.

prawks13y ago

Typography doesn't address /what/ should be written but rather /how/ it should be presented to make what was written as readable as possible.

I think this hints at what you're saying, though the author's presentation style may imply that his is the way formatting needs to be done.

As with anything, I don't think blindly adhering to example is a good idea; this case is no different.

You're dead-on that typography is subtle and tricky!

tmoertel13y ago

The author's intent doesn't matter at that point. The typographical treatment has already reached the reader's eyeballs.

    let x      = y + 1
        y      = foo 0
        foo i  = max 0 (i - 1)
    in ...

So this treatment adds clarity, not takes it away.

[1] http://www.haskell.org/onlinereport/exps.html#sect3.12

astrodust13y ago

The kind of people that line up their assignments and other syntax elements of the same sort are the ones that prefer justified text, even in inappropriate cases. It's annoying.

Keep them tight, learn to read code that way. Code is not ASCII art.

Luyt13y ago

georgef13y ago· 4 in thread

When I first started programming in C, I did the same thing with pointers (i.e.,

    char* str;

instead of

    char *str;

Unfortunately, this creates the wrong impression that

    char* str1, str2;

The clearest way I've ever found to think about C declarations (ironically, I think I read this in some article maligning C's syntax in favor of Go's), is that each declaration is of the format

    [type] [expressions--one for each new variable--that equate to type];

Thus, the way I think of declaring a char pointer is

    char [de-referencing the variable (which is a char pointer) to arrive at the char];
    char *str;

(Obviously, not everyone is going to agree that that is simple, but it works for me and my brain.)

Also, I hope I never write a for loop that looks so massively bloated--a matter of opinion I guess.

dsego13y ago

What is wrong with having str1 and str2 declared on separate lines? Like this:

  char* str1;
  char* str2;

What is gained with combining these declarations (except for less typing)?

georgef13y ago

Less typing is exactly the benefit, and with it less chance of error because of less repeated code. It is also another way of showing parallelism in your code.

And again, the wrong impression is conveyed (that you are declaring a variable of type char* instead of a pointer type that dereferences to a char).

With more complex types, not understanding what is really going on makes code incredibly opaque (and that typographical style ad hoc). What would you do with this:

    char *((*func)(char *));

With the convention used in the article, the declaration would be something like this:

    char* ((* func)(char*));

Which doesn't make clear why the outside parentheses are needed, or why there is a dereference (or multiplication?) operator in front of the variable name.

limmeau13y ago

Grouping comes to mind, e.g.

        int x, y;
        int width, height;
        int area;
        int numPoints;

_mpu13y ago

Less typing, less lines, more information on the screen and the readability is not impaired.

davidw13y ago· 4 in thread

On a bit of a related subject one thing I've always idly wondered about Lisp and typography is how hard the curvy parens, together with not much indentation make it hard to line stuff up vertically.

A random elisp example:

    (while (> count 0)
      (re-search-backward regexp bound)
      (when (and (> (point) (point-min))
                 (save-excursion (backward-char) (looking-at "/[/*]")))
        (forward-char))
      (setq parse (parse-partial-sexp saved-point (point)))
      (cond ((nth 3 parse)
             (re-search-backward
              (concat "\\([^\\]\\|^\\)" (string (nth 3 parse))) 
              (save-excursion (beginning-of-line) (point)) t))
            ((nth 7 parse) 
             (goto-char (nth 8 parse)))
            ((or (nth 4 parse)
                 (and (eq (char-before) ?/) (eq (char-after) ?*)))
             (re-search-backward "/\\*"))
            (t
             (setq count (1- count))))))

See where the ends of the parenthesis point? Not straight up or down, but diagonally.

Just a random thought... maybe it's just me.

akkartik13y ago

Yeah, I agree. Lisp is best with small functional blocks.

As function size grows, it doesn't look quite so pretty. An example from news.arc:

  (if no.user
      (submit-login-warning url title showtext text)
     (~and (or blank.url valid-url.url)
           ~blank.title)
      (submit-page user url title showtext text retry*)
     (len> title title-limit*)
      (submit-page user url title showtext text toolong*)
     (and blank.url blank.text)
      (let dummy 34
        (submit-page user url title showtext text bothblank*))
     (let site sitename.url
       (or big-spamsites*.site recent-spam.site))
      (msgpage user spammage*)
     (oversubmitting user ip 'story url)
      (msgpage user toofast*)
     (let s (create-story url process-title.title text user ip)
       (story-ban-test user s ip url)
       (when ignored.user (kill s 'ignored))
       (submit-item user s)
       (maybe-ban-ip s)
       "newest"))

There's a multi-branch if here, with lots of else ifs. But it's hard to see. In my toy dialect[1], I've added[2] colons as syntactic sugar that expands to nothing:

  (if no.user
      : (submit-login-warning url title showtext text)
     (~and (or blank.url valid-url.url)
           ~blank.title)
      : (submit-page user url title showtext text retry*)
     (len> title title-limit*)
      : (submit-page user url title showtext text toolong*)
     (and blank.url blank.text)
      : (let dummy 34
          (submit-page user url title showtext text bothblank*))
     (let site sitename.url
       (or big-spamsites*.site recent-spam.site))
      : (msgpage user spammage*)
     (oversubmitting user ip 'story url)
      : (msgpage user toofast*)
     : (let s (create-story url process-title.title text user ip)
         (story-ban-test user s ip url)
         (when ignored.user (kill s 'ignored))
         (submit-item user s)
         (maybe-ban-ip s)
         "newest"))

[1] https://github.com/akkartik/wart#readme

[2] http://arclanguage.org/item?id=16495

limmeau13y ago

Racket (and maybe other Schemes) lets you interchange [] and () freely. Helps a bit. e.g.

         (cond
           [(positive? -5) (error "doesn't get here")]
           [(zero? -5) (error "doesn't get here, either")]
           [(positive? 5) 'here])

1 more reply

brudgers13y ago

Finally, the code snippet does not appear to be the output of a pretty printer. Instead the typography is based on considerations beyond readability.

prawks13y ago

dsego13y ago· 4 in thread

  while (*from != 0) {
    *to = *from;
    from++
    to++;
  }

  while((*t++ = *f++) != 0 )

(Note, I don't really know if this is correct.)

oneeyedpigeon13y ago

Yes, a for loop is ridiculous for a string copy; K&R do it like so:

  while (*t++ = *f++)
      ;

dsego13y ago

zb13y ago

Maybe the reason "C-hackers" do it that way is because it actually works, while your example completely fails to null-terminate the destination string.

dsego13y ago

Good point. I guess it gets a bit more complicated when you try to do the right thing. So I'll re-examine the simple while loop like oneeyedpigeon posted:

  while (*t++ = *f++)
      ;

This code really packs a lot of punch. The value of f is copied to t and then both pointers are incremented. If the value is 0 the loop is terminated. So there's always at least one character copied.

In my initial rash response the loop would exit without copying the 0. So to fix it I might just add a new line

  *to = 0;

(if I'm not mistaken the pointer is already incremented when the loop exits).

Another option would be a while loop with a break statement, It looks weird, but does express the correct intent, which is "continuously copy from source string and exit if you've reached the end":

  while (true) {

    *to = *from;
    if (*to == 0) break;

    from++
    to++;

  }

jmoiron13y ago· 3 in thread

Unfortunately if you focus on beauty you sometimes break pragmatism. The JavaScript example in particular is not merely a typographical convention, but a way to avoid common errors.

    var i=1
      , j=2
      , k=3

It can be difficult to spot the lack of a trailing comma, or the end of this declaration list having a comma instead of a semicolon (, vs ;), both of which will break the execution of your script.

So please, do not change your code to make it look better without understanding why it's like that in the first place.

The classic example is tchanging the following to allman/gnu style braces would break it in JavaScript:

    // works, returns {a:1,b:2}
    return {
        a: 1,
        b: 2
    }

    // semicolon inserted after return, returns undefined
    return
    {
        a: 1,
        b: 2
    }

spartango13y ago

Indeed many tools (IDEs) do correct these kinds of problems, and it strikes me as silly that we have to worry about the execution of programs failing because of these types of typos/bugs.

ricardobeat13y ago

The article also states that a comma is required between variable declarations is one of the least important pieces of information in this code.

That is wrong. The comma is not incidental, it is an operator that tells you the next declaration is locally scoped. A missing comma changes the result of all following assignments.

rflrob13y ago

LowKarmaAccount13y ago· 2 in thread

yareally13y ago

The convention of = stems from mathematics and using it to assign input to variables there. Algebra probably being the first to use it about 1000 years ago.

wirrbel13y ago

Assignments != Equation

Assignments are pretty unmathematical since they represent memory operations and (usually) allow for mutation.

Wirth chose := in Pascal for a reason (with the comparison operator begin = instead of == there which is at least closer to the mathematical = operator).

andrus13y ago· 2 in thread

Would like to know whether the author feels his rewrite is more successful than the original. The following takes me longer to read:

    for (
                              ;
            (*to = *from) != 0;
            ++from, ++to
        )
        ;

Whereas the idiomatic version seems simpler:

    for (; (*to = *from) != 0; ++from, ++to);

_mpu13y ago

Luyt13y ago

When I saw that ridiculous formatted for() in the article, I began suspecting the article was meant as a joke.

jwarren13y ago· 2 in thread

I come from a photography and design background, and I've tended to naturally write code much like the author is suggesting.

However, the for loop is mystifying to me, but I don't fully understand the actual code there. Would anyone care to explain it?

maxerickson13y ago

Well, I guess you expect that it is copying something. It is implemented using C pointers, which enjoy a reputation of being difficult to explain. For example:

http://stackoverflow.com/questions/15151377/what-exactly-is-...

Accepting some level of fuzziness, the ++ parts are moving through two spots in memory, the * parts are copying the contents of the one to the other. It all goes 1 byte at a time.

(I apologize if my assumption that you are not familiar with C is off base)

gknoy13y ago

    for (; (*to = *from) != 0; ++from, ++to)

to and from are pointers to characters. When we increment them, we point to the next memory location in each string.

    "hello"  // a sequence of bytes with a zero marking the end 
     ^-- to

    do {
        *to = *from;    // copy character from source to destination
    } while (0 != *to); // until we reach the '\0' terminator

chris_wot13y ago· 1 in thread

I positively hate that for loop rewrite. Having the semi-colon so expressly indented looks awful, especially on it's own line.

I really feel that the for loop should be converted to a while loop - that would probably make the intent clearer.

homosaur13y ago

breadbox13y ago· 1 in thread

jmhain13y ago

No need to pick between optimization for reading and editing. You can have both by using tools like gofmt and astyle.

ebbv13y ago· 1 in thread

Given that we can't get developers to agree on where to place { I highly doubt we'd ever be able to settle on more esoteric formatting issues.

dsego13y ago

danbmil9913y ago· 1 in thread

I've always thought source code should use different fonts, and perhaps even non-monospace fonts in some cases (perhaps for strings, comments)

Why are we forced to stick with a single fixed-width font and color, limited use of italics, and no use of boldface?

Luyt13y ago

"Why are we forced to stick with a single fixed-width font"

Not necessarily ;-) Since proportional fonts became available in programmers' editors, I've been using them. See http://www.michielovertoom.com/incoming/desksnap-20101013.pn... for an example.

Sublime also allows proportional fonts, whereas TextMate does not.

gdubs13y ago· 1 in thread

Personally, I've always found code with a "table structure" prettier but less readable in practice. It's also fussy and time-consuming to maintain. I favor information density and flow.

numbsafari13y ago

PeterisP13y ago

The suggested transformations look nice for the trivial, tiny examples used, but would really hurt readability of code in general.

Code is not read linearly as a book - it is 'scanned' and reviewed back-and forth; and the compactness of the code is important for readability.

thedufer13y ago

>We’ve also re-set the data type such that there is no space between char and * - the data type of both of these variables is “pointer to char”

kstenerud13y ago

I'm having trouble deciding whether this is a serious post or a troll. The C "improvements" are particularly hideous.

orillian13y ago

Looking at this thread is why we need tools that let us structure code the way "WE" like it, but that outputs said code in a defaulted format.

  var  x                  = shape.left(),
         y                  = shape.right(),
                                                           <--**
         numSides = shape.sides();

meh, I give up trying to format this code block. :P
*This denotes parallels between variables to me. Not the alignment.

So ya, this stuff is very subjective and as @tmoertel said "subtle and tricky".

Flenser13y ago

If you also consider what commit diffs will look like if you have to insert or delete a variable, (particularly the last one) then comma first creates less cruft in your history.

wirrbel13y ago

    int                x =       get_width();
    const long double  y = 1.5 * get_width();

But what use is the aligning? Type, identifier and value of `x` are far apart, it is easy to switch lines here.

    int
    strcpy( ....

Type and identifier on separate lines. This style is not widely adopted, and that probably shows another aspect of typography: What is typographically correct depends on what is common.

_m_a_u_r_i_c_e_13y ago

zwieback13y ago

Also, I think color is really useful and something that's not used as much in traditional typography.

jamesaguilar13y ago

Chris_Newton13y ago

1 more reply

egypturnash13y ago

There's a big ol' elephant in the middle of the room that this post is not addressing.

The irrational love programmers have for horrible monospace fonts.

jimhefferon13y ago

My eyes aren't great, but I had trouble reading the code.

j / k navigate · click thread line to collapse