The history of curly braces (opens in new tab)

(bobbemer.com)

50 pointsbbunix13y ago23 comments

23 comments

19 comments · 5 top-level

Aardwolf13y ago· 12 in thread

My opinion about ASCII:

Why is there a back tick ` but no mirrored version of it (the ' is not)?

Why is there no degree symbol in it?

Why did they make most the first 32 symbols useless codes instead of also symbols?

Why did they have to start the whole "newline vs carriage return" thing, why not just a single newline character from the beginning?

Why is there no pilcrow, paragraph symbol, dagger and double dagger in it?

Why no symbols for not equals, subset, intersection, union, (no) element of, AND, OR in it?

Why is it 7-bit and not 8-bit? Who uses 7 bits, honestly.

Programming languages would have had some nicer symbols available if some of the above were done...

pwg13y ago

> Why did they make most the first 32 symbols useless codes instead of also symbols?

Why the first 32. Symmetry. Look at the table here: https://commons.wikimedia.org/wiki/File:ASCII_Code_Chart-Qui...

Four groups of 32, each group differing in only one bit position.

Why were the first 32 devoted to "control" codes? Because ASCII appeared in the days of mechanical paper teletype interfaces to computers and so there was a need for codes to control the teletype. Additionally the intent was to use other of the control codes for "control" (i.e., use of XON (control-q) and XOFF (control-s) for flow control).

> Why did they have to start the whole "newline vs carriage return" thing,

Because when you have a mechanical paper teletype printer as your interface to the computer, the teletype printer needs to be told to do two things:

1) return the print carriage to the left margin

2) roll the paper up one line

Having the ability to do both independently (for a paper printer) allows for simulating some otherwise impossible effects (i.e., bold face).

> Why is it 7-bit and not 8-bit?

To allow bit 8 to be used for a parity bit for data transmission purposes.

> Why is there no pilcrow, paragraph symbol, dagger and double dagger in it?

> Why no symbols for not equals, subset, intersection, union, (no) element of, AND, OR in it?

Likely because, after choosing to make a 7-bit code (to allow bit 8 to be parity), there is only so much room left in only 128 slots.

lanna13y ago

> Four groups of 32, each group differing in only one bit position.

That's impossible. If you have four groups, they have to differ in at least TWO bit positions.

rogerbinns13y ago

> .. allows for simulating some otherwise impossible effects (i.e., bold face).

A better example is underlining.

jbri13y ago

The first 32 ASCII code points are literally control codes, which are used to interface with computer hardware that makes use of ASCII - for example teletype machines.

To that extent, it's very much worthwhile to have a control code for a carriage return - setting the cursor to the start of the line without advancing to a new one. This lets you, for example, tell an electronic typewriter to double-strike a line for a boldface effect, or strikethrough a line of text.

And once you have that, you might as well have the control character that advances the output to the next line not move the cursor at all, since the operator can just send a carriage return if they do want to start at the beginning of the next line instead of where they left off on the previous one.

With regards to bitness, computers weren't always strictly power-of-2 based (quick example: the PDP-8 was a 12-bit machine). While today, memory is cheap enough that "rounding up" 30 bits to 32, or 7 bits to 8, is definitely worth it in terms of how it simplifies your other logic, back when ASCII was developed that wasn't really the case.

enf13y ago

The ` and ' characters were in fact mirrored in the original ISO 646, where they were supposed to be used to overstrike accents onto other letters. The straight apostrophe comes from the ISO 8859-1 era.

There is no degree symbol because nobody ever proposed one during the standards process. Most of the punctuation came from what appeared on US typewriters at the time. Likewise pilcrow, paragraph, etc.

The first 32 characters are controls because one of the major proponents of the code was Teletype, a division of AT&T. Nobody understood what network protocols were going to turn out to look like and existing protocols were very heavy on in-band signaling. It was an attempt to eradicate the worst features of the Baudot code that was previously in use, where every character had multiple shift modes and multiple protocol interpretations.

The newline vs. carriage return thing is also an artifact of AT&T's involvement. Most US computing organizations didn't care about controls at all and wanted fixed-size records. European computing organizations wanted a single newline. The compromise in ASCII-1968 was that LF could be interpreted as CRLF if sender and receiver agreed, which became the Multics convention and thence into Unix and C.

7-bit because computers at the time universally used 6-bit characters and nobody thought the computer people would actually use the control characters, only the middle 64 characters of the code. (No lower case either.) IBM threw a wrench in the works when they went to 8-bit bytes with the 360 and others followed.

My attempt to tell the ASCII story several years ago: https://docs.google.com/file/d/0B6gxjm4UN7VjZnFmNlIzQmJoRDg/...

chris_wot13y ago

Why is it 7-bit and not 8-bit?

The 8th bit allowed for various charactersets, encoded in Extended ASCII. The one most people know of is ISO-8859-1, also known as Latin-1.

It's more complicated than that, I wrote about it once at length here:

http://www.randomtechnicalstuff.blogspot.com.au/2009/05/unic...

Someone13y ago

ASCII being 7 bits made it easy to support a larger 8-bit character set that extended it on 8-bit systems, but that is not why ASCII is 7-bit and not 8-bit.

If ASCII where 8 bits to start with, we probably would have had zillions of incompatible extensions that used Control-N and Control-O (shift out and shift in) to switch into and out of 'non-ASCII' mode (here's an idea to sort-of improve upon HTML and XML: instead of < and >, use shift in and shift out to delineate nodes. That way, we wouldn't need that < stuff)

Alternatively, the nine bit byte might have won the battle.

akjj13y ago

More importantly for the future, UTF-8 is allows all of Unicode to be coded as an 8-bit extension of ASCII.

1 more reply

VMG13y ago

> Why is there no pilcrow, paragraph symbol, dagger and double dagger in it?

> Why no symbols for not equals, subset, intersection, union, (no) element of, AND, OR in it?

Which characters would you like to replace?

http://en.wikipedia.org/wiki/ASCII#ASCII_printable_character...

Aardwolf13y ago

Which to replace? Well, for example NUL, "start of heading", "start of text", "end of text", "end of transmission", and all the other now meaningless ones :) And then there's number 127 too of course.

With instead some characters useful for programming languages and basic text (and the problem is Unicode is a bit too big, confusing and redundant for programming languages...)

2 more replies

kps13y ago

Some of the design decisions of ASCII are covered in the original standard (first link) and in the other references here:

American Standard Code for Information Interchange http://www.wps.com/projects/codes/X3.4-1963/index.html

Revised U.S.A. Standard Code for Information Interchange http://www.wps.com/J/codes/Revised-ASCII/index.html 

Eric Fischer, The Evolution of Character Codes, 1874-1968 http://www.pobox.com/~enf/ascii/ascii.pdf

Tom Jennings, An annotated history of some character codes or ASCII: American Standard Code for Information Infiltration http://www.wps.com/J/codes/

R W Bemer, The 1960 Survey of Coded Character Sets: The Reasons for ASCII, http://www.trailing-edge.com/~bobbemer/SURVEY.HTM

R W Bemer, Design of an improved transmission/data processing code http://dx.doi.org/10.1145/366532.366538 

Charles E. MacKenzie, Coded Character Sets: History and Development 978-0201144604 

EEGuy13y ago

>Why is it 7-bit and not 8-bit?

To all the other history cited, there was also an early digital telephony multiplexing ("T1") practice of the day called "robbed bit signalling"[1] [2]. You got 7 bits clean, 8 bits - not so much.

----------

[1] https://en.wikipedia.org/wiki/Robbed-bit_signaling

[2] https://en.wikipedia.org/wiki/8-bit_clean

thristian13y ago· 1 in thread

The article mentions that at one point the ASCII standard moved the alphabetic characters one space to the left, because the author happened to have examined a phone directory while visiting Copenhagen and noted that Denmark sorts its three accented characters at the end of their alphabet.

An interesting trivium: the IRC protocol lets users choose their own nicknames, and demands that nicknames be compared in a case-insensitive manner. For the purposes of IRC, you must treat {|} as the 'lower-case' versions of [\], because IRC was invented in Scandinavia and it did indeed use those character-codes for accented characters.

e12e13y ago

And here I suspected you were trolling... (in that case, if you've successfully trolled the RFC - more power to you :)

http://tools.ietf.org/html/rfc2812#section-2.2

" 2.2 Character codes

   No specific character set is specified. The protocol is based on a
   set of codes which are composed of eight (8) bits, making up an
   octet.  Each message may be composed of any number of these octets;
   however, some octet values are used for control codes, which act as
   message delimiters.

   Regardless of being an 8-bit protocol, the delimiters and keywords
   are such that protocol is mostly usable from US-ASCII terminal and a
   telnet connection.

   Because of IRC's Scandinavian origin, the characters {}|^ are
   considered to be the lower case equivalents of the characters []\~,
   respectively. This is a critical issue when determining the
   equivalence of two nicknames or channel names."

pygy_13y ago· 1 in thread

He mentions thqt the backslash was added to the character set to allow the representation of boolean operators when paired with its forward cousin:

    A /\ B \/ C

I wonder why that notation was abandoned in programming languages.

chinpokomon13y ago

& and ^ are just one character? Actually I think you could use both to disambiguate between logical and bitwise. I'm not sure if it is a good idea yet, but it would force the developer to see them as different operations; something that easily confuses beginners.

vanderZwan13y ago

As much as they are hated by some, there's empirical data indirectly suggesting that using curly braces for scoping might lead to less mistakes when reading code:

> Similarly, the twospaces version of counting demonstrated that vertical space is more important then indentation to programmers when judging whether or not statements belong to the same loop body. Programmers often group blocks of related statements together using vertical white space, but our results indicate that this seemingly superficial space can cause even experienced programmers to internalize the wrong program.

http://arxiv.org/abs/1304.5257

Of course, whether or not curly braces significantly help in this situation would require another experiment. Anecdotally though, I do feel like it requires less mental effort to structure code that I'm reading.

jckt13y ago

Google's cache here (as it seems down for some people):

http://goo.gl/PWIme

j / k navigate · click thread line to collapse

23 comments

19 comments · 5 top-level

Aardwolf13y ago· 12 in thread

My opinion about ASCII:

Why is there a back tick ` but no mirrored version of it (the ' is not)?

Why is there no degree symbol in it?

Why did they make most the first 32 symbols useless codes instead of also symbols?

Why did they have to start the whole "newline vs carriage return" thing, why not just a single newline character from the beginning?

Why is there no pilcrow, paragraph symbol, dagger and double dagger in it?

Why no symbols for not equals, subset, intersection, union, (no) element of, AND, OR in it?

Why is it 7-bit and not 8-bit? Who uses 7 bits, honestly.

Programming languages would have had some nicer symbols available if some of the above were done...

pwg13y ago

> Why did they make most the first 32 symbols useless codes instead of also symbols?

Why the first 32. Symmetry. Look at the table here: https://commons.wikimedia.org/wiki/File:ASCII_Code_Chart-Qui...

Four groups of 32, each group differing in only one bit position.

> Why did they have to start the whole "newline vs carriage return" thing,

Because when you have a mechanical paper teletype printer as your interface to the computer, the teletype printer needs to be told to do two things:

1) return the print carriage to the left margin

2) roll the paper up one line

Having the ability to do both independently (for a paper printer) allows for simulating some otherwise impossible effects (i.e., bold face).

> Why is it 7-bit and not 8-bit?

To allow bit 8 to be used for a parity bit for data transmission purposes.

> Why is there no pilcrow, paragraph symbol, dagger and double dagger in it?

> Why no symbols for not equals, subset, intersection, union, (no) element of, AND, OR in it?

Likely because, after choosing to make a 7-bit code (to allow bit 8 to be parity), there is only so much room left in only 128 slots.

lanna13y ago

> Four groups of 32, each group differing in only one bit position.

That's impossible. If you have four groups, they have to differ in at least TWO bit positions.

rogerbinns13y ago

> .. allows for simulating some otherwise impossible effects (i.e., bold face).

A better example is underlining.

jbri13y ago

The first 32 ASCII code points are literally control codes, which are used to interface with computer hardware that makes use of ASCII - for example teletype machines.

enf13y ago

My attempt to tell the ASCII story several years ago: https://docs.google.com/file/d/0B6gxjm4UN7VjZnFmNlIzQmJoRDg/...

chris_wot13y ago

Why is it 7-bit and not 8-bit?

The 8th bit allowed for various charactersets, encoded in Extended ASCII. The one most people know of is ISO-8859-1, also known as Latin-1.

It's more complicated than that, I wrote about it once at length here:

http://www.randomtechnicalstuff.blogspot.com.au/2009/05/unic...

Someone13y ago

ASCII being 7 bits made it easy to support a larger 8-bit character set that extended it on 8-bit systems, but that is not why ASCII is 7-bit and not 8-bit.

Alternatively, the nine bit byte might have won the battle.

akjj13y ago

More importantly for the future, UTF-8 is allows all of Unicode to be coded as an 8-bit extension of ASCII.

1 more reply

VMG13y ago

> Why is there no pilcrow, paragraph symbol, dagger and double dagger in it?

> Why no symbols for not equals, subset, intersection, union, (no) element of, AND, OR in it?

Which characters would you like to replace?

http://en.wikipedia.org/wiki/ASCII#ASCII_printable_character...

Aardwolf13y ago

With instead some characters useful for programming languages and basic text (and the problem is Unicode is a bit too big, confusing and redundant for programming languages...)

2 more replies

kps13y ago

Some of the design decisions of ASCII are covered in the original standard (first link) and in the other references here:

American Standard Code for Information Interchange http://www.wps.com/projects/codes/X3.4-1963/index.html

Revised U.S.A. Standard Code for Information Interchange http://www.wps.com/J/codes/Revised-ASCII/index.html 

Eric Fischer, The Evolution of Character Codes, 1874-1968 http://www.pobox.com/~enf/ascii/ascii.pdf

Tom Jennings, An annotated history of some character codes or ASCII: American Standard Code for Information Infiltration http://www.wps.com/J/codes/

R W Bemer, The 1960 Survey of Coded Character Sets: The Reasons for ASCII, http://www.trailing-edge.com/~bobbemer/SURVEY.HTM

R W Bemer, Design of an improved transmission/data processing code http://dx.doi.org/10.1145/366532.366538 

Charles E. MacKenzie, Coded Character Sets: History and Development 978-0201144604 

EEGuy13y ago

>Why is it 7-bit and not 8-bit?

To all the other history cited, there was also an early digital telephony multiplexing ("T1") practice of the day called "robbed bit signalling"[1] [2]. You got 7 bits clean, 8 bits - not so much.

----------

[1] https://en.wikipedia.org/wiki/Robbed-bit_signaling

[2] https://en.wikipedia.org/wiki/8-bit_clean

thristian13y ago· 1 in thread

e12e13y ago

And here I suspected you were trolling... (in that case, if you've successfully trolled the RFC - more power to you :)

http://tools.ietf.org/html/rfc2812#section-2.2

" 2.2 Character codes

   No specific character set is specified. The protocol is based on a
   set of codes which are composed of eight (8) bits, making up an
   octet.  Each message may be composed of any number of these octets;
   however, some octet values are used for control codes, which act as
   message delimiters.

   Regardless of being an 8-bit protocol, the delimiters and keywords
   are such that protocol is mostly usable from US-ASCII terminal and a
   telnet connection.

   Because of IRC's Scandinavian origin, the characters {}|^ are
   considered to be the lower case equivalents of the characters []\~,
   respectively. This is a critical issue when determining the
   equivalence of two nicknames or channel names."

pygy_13y ago· 1 in thread

He mentions thqt the backslash was added to the character set to allow the representation of boolean operators when paired with its forward cousin:

    A /\ B \/ C

I wonder why that notation was abandoned in programming languages.

chinpokomon13y ago

vanderZwan13y ago

As much as they are hated by some, there's empirical data indirectly suggesting that using curly braces for scoping might lead to less mistakes when reading code:

http://arxiv.org/abs/1304.5257

jckt13y ago

Google's cache here (as it seems down for some people):

http://goo.gl/PWIme

j / k navigate · click thread line to collapse