occasional? Seriously? rare? Seriously?
"Source code should be written to be understood by people." Nope, Source code should be written to be executed. If people can understand it easily, its a plus point, not a baseline.
Considering that the keyboard is the primary way we write sources, I find it difficult enough to keep my fingers in speed with my thoughts. In addition to that if I have to press tabs to align each of the statements in my 'for's, I'll be left in a much poorer way.
This viewpoint is described well in the Preface to the First Edition of Structure and Interpretation of Computer programs, 2nd paragraph:
http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-7.html
I would suggest reading the whole Preface. It's quite an inspirational work for this field.
Enjoyed it! Thanks!
http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-5.html
Not that the contents of the Preface should be ignored :)
I'd rather it be performed automatically. But, I find very few tools that would format it the way I want.
For example, consider the alignment in the following, improved snippet from the blog post:
var x = shape.left(),
y = shape.right(),
numSides = shape.sides();
It may be “better” typographically, but it also suggests a false parallelism.The eye can't help but interpret closely packed things as groups. So the subliminal cue presented by the formatting above is that there is a parallel assignment from the group of expressions on the right to the group of variables on the left. That is, at some level the eye can't help but see the code above as
{ e }
{ v } { x }
var { a } = { p } ;
{ r } { r }
{ s } { s }
But the evaluation and assignment are not parallel! They are sequential. The difference may not matter in this example, but it's easy to imagine this kind of formatting applied to examples where it would.The less-formatted original version below actually represents the reality more faithfully because its shape does not suggest parallelism:
var x = shape.left(),
y = shape.right(),
numSides = shape.sides();
It looks more like a sequence, which it is.Indeed, typography is a subtle and tricky thing.
I think this hints at what you're saying, though the author's presentation style may imply that his is the way formatting needs to be done.
"What was written" is analogous to "what the author meant". If they author makes the assumption that those operations may be executed in parallel, irregardless of order, then the exampled typesetting may be appropriate. "Better" typography shouldn't imply that it looks pretty, as in artistic. It should be more readable, without getting in the way of conveying the author's intent. It should aid the reader in reading, and in reading, arriving at the author's meaning.
As with anything, I don't think blindly adhering to example is a good idea; this case is no different.
You're dead-on that typography is subtle and tricky!
The author's intent doesn't matter at that point. The typographical treatment has already reached the reader's eyeballs.
If the language were Haskell, however, the align-on-equals treatment would actually communicate the truth. For example, the following code means exactly what its visual presentation suggests it means (see [1]):
let x = y + 1
y = foo 0
foo i = max 0 (i - 1)
in ...
So this treatment adds clarity, not takes it away.You also end up with "floaters", where if in this case `numSides` is removed, x and y assignments will have a needless number of spaces. These can be corrected, but you'll also inherit "blame" for the change, which is misinformation.
Keep them tight, learn to read code that way. Code is not ASCII art.
char* str;
instead of char *str;
).Unfortunately, this creates the wrong impression that
char* str1, str2;
creates two pointer-to-char variables, whereas actually str1 is a char pointer and str2 is simply a char. Indeed, I was confused on this point myself when I was a newbie, which led to great confusion later on.The clearest way I've ever found to think about C declarations (ironically, I think I read this in some article maligning C's syntax in favor of Go's), is that each declaration is of the format
[type] [expressions--one for each new variable--that equate to type];
Thus, the way I think of declaring a char pointer is char [de-referencing the variable (which is a char pointer) to arrive at the char];
char *str;
(Obviously, not everyone is going to agree that that is simple, but it works for me and my brain.)Anyway, the point is that while typography is great, it can be just as harmful as helpful if you communicate the wrong impression to the reader. And to be fair, it really looks like the author is not at home in C: besides his misconception about pointer declaration, he didn't bat an eye at the old-style argument-declaration syntax that has been obsolete since ANSI C.
Also, I hope I never write a for loop that looks so massively bloated--a matter of opinion I guess.
char* str1;
char* str2;
What is gained with combining these declarations (except for less typing)?And again, the wrong impression is conveyed (that you are declaring a variable of type char* instead of a pointer type that dereferences to a char).
With more complex types, not understanding what is really going on makes code incredibly opaque (and that typographical style ad hoc). What would you do with this:
char *((*func)(char *));
That isn't a char pointer at all. It's a pointer to a function that takes a char pointer as an argument and returns (i.e., evaluates to) a char pointer. It looks incredibly dense (to me, at least), unless I think of it in the way I talked about above (in which case it all makes sense and is kind of cool).With the convention used in the article, the declaration would be something like this:
char* ((* func)(char*));
Which doesn't make clear why the outside parentheses are needed, or why there is a dereference (or multiplication?) operator in front of the variable name.The idea of a variable declaration being an expression that evaluates to a basic type is actually the reason why the same symbol * is used in the declaration and in the dereferencing of pointers: they mean the same thing.
Basically, what I am trying to say is that the typographical practice used in the article is at odds with the actual meaning of the statement (and the explanation he gives shows he clearly does misunderstand the statement), which can only lead to confusion in the long run, especially when you encounter code written by other people. Or, as with your example, it can lead to eschewing a useful feature of a language simply because it doesn't look pretty according to your arbitrary whitespace conventions.
int x, y;
int width, height;
int area;
int numPoints;A random elisp example:
(while (> count 0)
(re-search-backward regexp bound)
(when (and (> (point) (point-min))
(save-excursion (backward-char) (looking-at "/[/*]")))
(forward-char))
(setq parse (parse-partial-sexp saved-point (point)))
(cond ((nth 3 parse)
(re-search-backward
(concat "\\([^\\]\\|^\\)" (string (nth 3 parse)))
(save-excursion (beginning-of-line) (point)) t))
((nth 7 parse)
(goto-char (nth 8 parse)))
((or (nth 4 parse)
(and (eq (char-before) ?/) (eq (char-after) ?*)))
(re-search-backward "/\\*"))
(t
(setq count (1- count))))))
See where the ends of the parenthesis point? Not straight up or down, but diagonally.Just a random thought... maybe it's just me.
"It used to be thought that you could judge someone's character by looking at the shape of his head. Whether or not this is true of people, it is generally true of Lisp programs. Functional programs have a different shape from imperative ones. The structure in a functional program comes entirely from the composition of arguments within expressions, and since arguments are indented, functional code will show more variation in indentation. Functional code looks fluid on the page; imperative code looks solid and blockish, like Basic." -- On Lisp (http://www.paulgraham.com/onlisptext.html; page 30)
As function size grows, it doesn't look quite so pretty. An example from news.arc:
(if no.user
(submit-login-warning url title showtext text)
(~and (or blank.url valid-url.url)
~blank.title)
(submit-page user url title showtext text retry*)
(len> title title-limit*)
(submit-page user url title showtext text toolong*)
(and blank.url blank.text)
(let dummy 34
(submit-page user url title showtext text bothblank*))
(let site sitename.url
(or big-spamsites*.site recent-spam.site))
(msgpage user spammage*)
(oversubmitting user ip 'story url)
(msgpage user toofast*)
(let s (create-story url process-title.title text user ip)
(story-ban-test user s ip url)
(when ignored.user (kill s 'ignored))
(submit-item user s)
(maybe-ban-ip s)
"newest"))
There's a multi-branch if here, with lots of else ifs. But it's hard to see. In my toy dialect[1], I've added[2] colons as syntactic sugar that expands to nothing: (if no.user
: (submit-login-warning url title showtext text)
(~and (or blank.url valid-url.url)
~blank.title)
: (submit-page user url title showtext text retry*)
(len> title title-limit*)
: (submit-page user url title showtext text toolong*)
(and blank.url blank.text)
: (let dummy 34
(submit-page user url title showtext text bothblank*))
(let site sitename.url
(or big-spamsites*.site recent-spam.site))
: (msgpage user spammage*)
(oversubmitting user ip 'story url)
: (msgpage user toofast*)
: (let s (create-story url process-title.title text user ip)
(story-ban-test user s ip url)
(when ignored.user (kill s 'ignored))
(submit-item user s)
(maybe-ban-ip s)
"newest"))
[1] https://github.com/akkartik/wart#readme (cond
[(positive? -5) (error "doesn't get here")]
[(zero? -5) (error "doesn't get here, either")]
[(positive? 5) 'here])A second feature appears to be optimizing the layout to display in "forty lines." There are places where the lines could be shorter but aren't. Not abstracting the strings falls somewhat into this category.
Finally, the code snippet does not appear to be the output of a pretty printer. Instead the typography is based on considerations beyond readability.
while (*from != 0) {
*to = *from;
from++
to++;
}
To me this looks much saner (unless I'm doing something wrong, I'm a bit rusty on pointers). But I notice that a lot of "C-hackers" try to cram as much into one line as they can, often including every possible pointer incrementation and assignment. At least here, the variables are properly named. A lot of C code uses one-letter variables and reading those isn't a lot of fun, e.g. while((*t++ = *f++) != 0 )
(Note, I don't really know if this is correct.) while (*t++ = *f++)
;
The pointless comparison to 0 is removed, and the semi-colon is required to terminate the statement. Of course, the point about this version is, once you understand it, it's trivial to recognise. Parsing the 'full' version, especially as formatted in the article, takes a lot longer because it's not familiar and contains more parts to read, any of which could deviate from what might be expected. The other positive is much more code is visible at once. while (*t++ = *f++)
;
This code really packs a lot of punch. The value of f is copied to t and then both pointers are incremented. If the value is 0 the loop is terminated. So there's always at least one character copied.In my initial rash response the loop would exit without copying the 0. So to fix it I might just add a new line
*to = 0;
(if I'm not mistaken the pointer is already incremented when the loop exits).Another option would be a while loop with a break statement, It looks weird, but does express the correct intent, which is "continuously copy from source string and exit if you've reached the end":
while (true) {
*to = *from;
if (*to == 0) break;
from++
to++;
} var i=1
, j=2
, k=3
You can remove any of the comma prefixed lines there (even the last one) and not introduce an error. You can add another similarly prefixed line anywhere to the list and not introduce an error. It's obvious if a comma is missing (which is good, because you don't have a compiler to let you know).It can be difficult to spot the lack of a trailing comma, or the end of this declaration list having a comma instead of a semicolon (, vs ;), both of which will break the execution of your script.
So please, do not change your code to make it look better without understanding why it's like that in the first place.
The classic example is tchanging the following to allman/gnu style braces would break it in JavaScript:
// works, returns {a:1,b:2}
return {
a: 1,
b: 2
}
// semicolon inserted after return, returns undefined
return
{
a: 1,
b: 2
}Indeed many tools (IDEs) do correct these kinds of problems, and it strikes me as silly that we have to worry about the execution of programs failing because of these types of typos/bugs.
That is wrong. The comma is not incidental, it is an operator that tells you the next declaration is locally scoped. A missing comma changes the result of all following assignments.
Another impediment to readability is the insistence upon representing code by using hideous, low contrast colors on a dark background, especially when the code snippets are mixed with the conventional representation of black text on a white background.
My favorite quote about C's synatx was written by Erik Naggum: "If you care to know my opinion, I think semicolon-and-braces-oriented syntaxes suck and that it is a very, very bad idea to use them at all. It is far easier to write a parser for a syntax with the Lisp nature in any language than it is to write a parser for thet stupid semiconcoction. Whoever decided to use the semicolon to _end_ something should just be taken out and have his colon semified. (At least COBOL and SQL managed to use a period.)"
The convention of = stems from mathematics and using it to assign input to variables there. Algebra probably being the first to use it about 1000 years ago.
Assignments are pretty unmathematical since they represent memory operations and (usually) allow for mutation.
Wirth chose := in Pascal for a reason (with the comparison operator begin = instead of == there which is at least closer to the mathematical = operator).
for (
;
(*to = *from) != 0;
++from, ++to
)
;
Whereas the idiomatic version seems simpler: for (; (*to = *from) != 0; ++from, ++to);However, the for loop is mystifying to me, but I don't fully understand the actual code there. Would anyone care to explain it?
http://stackoverflow.com/questions/15151377/what-exactly-is-...
Accepting some level of fuzziness, the ++ parts are moving through two spots in memory, the * parts are copying the contents of the one to the other. It all goes 1 byte at a time.
(I apologize if my assumption that you are not familiar with C is off base)
for (; (*to = *from) != 0; ++from, ++to)
to and from are pointers to characters. When we increment them, we point to the next memory location in each string. "hello" // a sequence of bytes with a zero marking the end
^-- to
In our loop, we don't need to allocate any helper variables, and can just increment our pointers to point to the next memory slot in each string. I'm not a C expert, so I may get this subtly wrong, but I believe this is equivalent to: do {
*to = *from; // copy character from source to destination
} while (0 != *to); // until we reach the '\0' terminatorI really feel that the for loop should be converted to a while loop - that would probably make the intent clearer.
When I was younger I did a lot more interior alignment between lines, like with the list of variable declarations. Over the years I've found that they add a lot of busywork effort to the editing process. Everyone talks about optimizing for reading, but not optimizing for editing, which in some cases is really what you need to be optimizing for. But even laying that aside, the readability improvement of such spacing was debatable. Sometimes it's making visible a repetition (presumably one that couldn't be done with an actual loop), but a lot of times it feel more like a novelist who decided that the main verbs of their sentences should be vertically aligned. Agreed, it's making something visible -- but is that really what the average reader cares about? Ultimately the typography should not be a distraction from discerning the sense of a piece of code.
I'm also not sure that I want a developer spending much of his brain space or time prettying up the code beyond what is currently considered well formatted code. While it might be nice, there's also probably better things she could be working on.
Why are we forced to stick with a single fixed-width font and color, limited use of italics, and no use of boldface?
Not necessarily ;-) Since proportional fonts became available in programmers' editors, I've been using them. See http://www.michielovertoom.com/incoming/desksnap-20101013.pn... for an example.
Sublime also allows proportional fonts, whereas TextMate does not.
Code is not read linearly as a book - it is 'scanned' and reviewed back-and forth; and the compactness of the code is important for readability.
Also, on currently standard(sadly) computers, especially laptops, vertical space is very restricted in standard LCD dimensions. If you spread out a screen of semanically linked code to two screens, then you suddenly can't grasp it all at once w/o scrolling through the pages back and forth, and that is a real loss. Newlines and empty lines can and must be used to group things in "paragraphs", but the OP suggestions waste far too many lines.
>We’ve also re-set the data type such that there is no space between char and * - the data type of both of these variables is “pointer to char”
No. No no no. That is how you end up with people who can't parse `char* a, b;`. What you're doing is declaring that `* to` and `* from` are `char`s. The common reading that the post suggests is wrong and detrimental to reading C properly.
Basically IDE formatting should be outside of the actual code formatting allowing people to format code how they see fit. Some are going to like to see the code the "standards" way and that's fine; Many of us will not.
@tmoertel's point is a prime example of this, as the only relationship I personally see in the aligned code is that they are all variables. His perceived parallelism was not present in my personal view of that same code snippet. Now for me white-space denotes parallelism more poignantly than alignment.
var x = shape.left(),
y = shape.right(),
<--**
numSides = shape.sides();
meh, I give up trying to format this code block. :P*This denotes parallels between variables to me. Not the alignment.The only parallel that is draw for me is that all the items are variables (this would also be reinforced by the color I give variables). I group things with white-space; so that means in the context of the function using the variables as I've laid them out above; x,y are related and most likely linked, where numSides is needed but not necessarily associated inside the function with the x,y variables.
So ya, this stuff is very subjective and as @tmoertel said "subtle and tricky".
O.
The OP does seem to favour a table-like grid layout for source code, yet I rather feel like source code describes hierarchical trees (that is why we like the indenting). Arranging things in a tabular way is not principally beneficial. Sometimes you encounter things like this:
int x = get_width();
const long double y = 1.5 * get_width();
But what use is the aligning? Type, identifier and value of `x` are far apart, it is easy to switch lines here.Now most programming languages have a big problem with indentation and layout because their syntax is weird. C's habit of putting types in front, etc. That is probably why the GNU C styleguide is proposing something like this:
int
strcpy( ....
Type and identifier on separate lines. This style is not widely adopted, and that probably shows another aspect of typography: What is typographically correct depends on what is common.Also, I think color is really useful and something that's not used as much in traditional typography.
For example, even fairly basic textbooks make extensive use of footnotes, endnotes, marginal notes, sidebars, illustrations, tabular material, bibliographies, and other similar tools. What these all have in common is that they remove valuable supporting information from the main text, but keep it available (with varying degrees of immediacy) for readers who want to refer to it, and provide conventions to help the reader find it. In some cases, they also present that material in a structured way that is more effective than plain text.
With modern interactive IDEs, we have the ability to use not only all the same tricks that traditional book typesetting does but also many more because ours can be dynamic, interactive, graphical, or any combination of the above. We can add supporting material on any of four sides around a code listing, or even overlay information on the listing itself, and we can change the information we show in those places automatically with context and/or at the reader’s request based on what they want to do next. And yet, except for debugging and for navigating around large projects, we tend to make very little use of these tools. We still mostly try to present code as a single static column of monospaced material with the occasional extra vertical space and a bit of horizontal alignment to emphasize structure, and maybe a left sidebar with line numbers and a couple of icons for bookmarks or breakpoints, or perhaps a bit of trivial dynamic highlighting during things like find/replace work.
What if instead we tried to make the main code area focus on the core logic and data, and move anything else out of the way? There are many opportunities to do that. Type annotations? Supporting data. Comments? Supporting data. Long list of module.names.before.useful.identifier? Probably supporting data as long as it’s unambiguous, probably something you want to draw attention to if it’s not. Even keywords like ‘var’ in the example code snippets don’t help a human reader much. And that’s just with the kind of conventions and languages we use today, without even considering the endless possibilities of languages and tools designed with alternative presentation in mind from the start.
None of this even needs to get in the way of the tried and tested practice of storing code in plain text files. It could all be dealt with in your editor/IDE using exactly the same kinds of techniques as the standardised-formatting tools other posters have mentioned here, and saving a file could record the supporting data in a standardised text format that is friendly to version control systems, code review tools, automated diffs, and so on.
In short, if we’re going to address readability and presentation in programming, let’s think outside the box a bit. We have the most powerful information sharing and presentation tools in human history at our disposal, and a mountain of data about usability and interface design. We can do more than worrying about how many spaces we should have between tab stops.
The irrational love programmers have for horrible monospace fonts.