ASCII and Unicode quotation marks (opens in new tab)

(cl.cam.ac.uk)

176 pointsanschwa8y ago186 comments

186 comments

88 comments · 26 top-level

lisper8y ago· 15 in thread

The fact that ASCII does not have balanced quotes is one of the great catastrophes of computing. It makes everything more complicated than it needs to be, from embedding code in strings to parsing CSV files, to regexps. For example, if I want to embed a quoted string in another quoted string, I have to escape the inner quotes like so:

"This is string containing an embedded \"quoted\" string"

Then I have to think about whether or not the system I'm going to send that string to is going to "helpfully" remove the backslashes, in which case I need to write:

"This is a string containing an embedded \\"quoted\\" string"

God help you if you want to go two levels deep.

All this horrible complexity could have been avoided if we could just write:

«This is a string containing an «embedded» quoted string»

Alas.

eesmith8y ago

The complexity might be minimized, but not avoided. You would still need an escape mechanism for something like «She said «The \» key on the server doesn't work.»»

ASCII did add <>, [], and {}, any of which could have been used for quoted strings, had the programming language designers chosen that option.

https://en.wikipedia.org/wiki/String_literal#Paired_delimite... points out that PostScript and Tcl have a string literal which allows matched quotes.

  PostScript: (The quick (brown fox))
  Tcl: {The quick {brown fox}}

stormbrew8y ago

Ruby lets you use arbitrary tokens for string literals with %s{} (where the braces can be a bunch of things). I wish more languages would adopt this tbh.

4 more replies

lisper8y ago

> You would still need an escape mechanism for something like...

Yes, but that's a pretty rare case, much more so than embedded strings.

Even that case could be solved by having two different quotes, like Python which allows both 'string' and "string". So you could do:

«This is a string that mentions the ” character without escaping it»

“This is a string that mentions the « character without escaping it”

Yes, there are still some edge cases, like embedding both “ and « in the same string. But that's really rare.

1 more reply

xelxebar8y ago

> You would still need an escape mechanism for ...

I think this is actually desirable, since in your case the escape denotes different semantics. The unescaped pairs act like quotation operators while the escaped version is a character literal.

dragonwriter8y ago

Also Ruby:

  %q{This is a string with an %q{embedded quote}.}

1 more reply

zaxomi8y ago

Actually, ASCII have mechanism for solving the problem that you describe, with control codes FS, GS, RS and US.

jiggunjer8y ago

I disagree. Sure it might regex better, but my typing speed and typo rate would be much worse if I had to type separate open and close quotes for all my strings.

coldtea8y ago

>All this horrible complexity could have been avoided if we could just write

Only if there was no chance of unbalanced quotes to need to be in the string.

mort968y ago

You'd still have escape sequences for those cases.

tome8y ago

If ASCII had balanced quotes then they would be used by programming languages to delimit strings and we would be back to square one with regards to escaping them!

hk__28y ago

You don’t need escaping in «This is a string containing an «embedded» quoted string».

4 more replies

vvanders8y ago

Yeah, seen a ton of tools that auto-format to left/right quote automatically but then output ASCII and mangle the conversion.

agumonkey8y ago

let's rewrite social idioms to use < > as quotes.

Symbiote8y ago

«These characters» are the usual way of quoting in several languages.

See https://en.wikipedia.org/wiki/Guillemet

1 more reply

cgtyoder8y ago

> The fact that ASCII does not have balanced quotes is one of the great catastrophes of computing.

Okay.

peapicker8y ago· 8 in thread

I'm pretty sure text like:

  ``quoted''

Is how you're supposed to write short quotes in the TeX/LaTeX typesetting system.

[edit: My point being that the author seems to think this type of quoting originated with X11... which is actually newer than TeX (X11 was first released in 1984), and that the prevalence of this type of quoting likely originated with TeX when it was released in 1978... which isn't mentioned at all in the article. In fact, since TeX/LaTeX is what all the CS, Physics, and Math types were using for journal articles, it is likely the X11 font bitmap glyphs were intentionally shaped like curly quotes to make editing your TeX source files prettier.

At least, that's how I remember it...]

leephillips8y ago

Interesting historical note. Of course that's TeX input, and the author using it knows that it will be interpreted in TeX's special way and the correct characters used in the typeset output. Also, with the current Unicode-aware TeX engines, you can just input the normal Unicode quotation marks. That makes your source easier to read.

ams61108y ago

Yeah but that gets rendered as the proper quote glyph in the final document.

gbacon8y ago

Another giveaway of a TeX-savvy writer out of water is when you see --- for em-dash, i.e., ‘—’.

gbacon8y ago

Does HN markdown understand — or ‘?

EDIT: Nope.

2 more replies

fish_fan8y ago

That's because single quotes used to be rendered as a right single quote (as you might have in a contraction), and the backtick was angled much less aggressively. That is, it looked much more natural at the time.

emmelaich8y ago

Yeah, I think the motivation for `' is for markup too. I'm pretty sure they've been recommended in GNU info and groff for that reason.

ISL8y ago

It is, and denotes opening and closing quotes.

dheera8y ago

I always hated this horribly inconsistency.

    \left( \right)
    \left{ \right}
    `` ''

Why not

    \left" \right"
    \left' \right'

Better yet, make it completely DRY:

    \( \)
    \{ \}
    \`` \''
    \` \'

treve8y ago· 6 in thread

It just occurred to me how much easier certain text-operations (like syntax highlighting, regular expressions and other parsers) if we consistently used the right unicode symbols for quotes and apostrophes

mbrock8y ago

The only languages I know off the top of my head that use balanced delimiters for strings are M4 and Perl 6.

Hey, imagine being able to nest strings without escaping! What a concept!

chaosfox8y ago

Perl does that as well, and you can even choose the delimiters you wanna use:

> For the constructs except here-docs, single characters are used as starting and ending delimiters. If the starting delimiter is an opening punctuation (that is (, [, {, or < ), the ending delimiter is the corresponding closing punctuation (that is ), ], }, or >). If the starting delimiter is an unpaired character like / or a closing punctuation, the ending delimiter is the same as the starting delimiter. Therefore a / terminates a qq// construct, while a ] terminates both qq[] and qq]] constructs.

kazinator8y ago

PostScript! It uses (...) for strings.

Nesting string literals without escaping is a somewhat poor concept, though. Firstly, what does that even mean? Given `abc `def' ghi', what is the string here? Is it abc def ghi or is it abc `def' ghi? Secondly, what if I want to just have an unbalanced ` character in the string data?

stevelosh8y ago

Common Lisp doesn't have it built in, but the cl-interpol library adds this (and you can add your own custom delimiters too).

http://weitz.de/cl-interpol/#syntax

jstimpfle8y ago

Not a language, but I adopted this concept as well: http://jstimpfle.de/projects/wsl/main.html

And I guess you could count HTML in, too.

psadauskas8y ago

And every time I get in an argument with a poorly-escaped CSV file, I wish we had just used ASCII 28-31 as delimiters. (File, Group, Record and Unit Separator)

jimmies8y ago· 5 in thread

I hate the "" -> “” thing with a passion. I don't know how much productivity that the world has lost with that “” shit.

It doesn't look that much better, and it always fucks with me at random times. That shit is on the list of annoying problems that shouldn't exist in the first place, along with the \nl\cr thing, and the txt saved as rtf thing, and the UTF-8 encoding-character-at-the-beginning-of-the-file or whatever it is called.

Someone complains your program gives them an error when they open a csv file you sent them. You tested your program, it works. You go on the phone with them for 30 minutes, try to figure out what the fuck was going on. There it is, it was opened in a program that meddles with that "" and replaces it with the “” shit.

Also, there has to be at least one time you're fucked by the "" -> “” snobbiness when you go to a random Wordpress site and paste the command they tell you to do to the command line and realize it doesn't work. You pull your hair for a couple of minutes, and there is that sneaky ” thing. Wordpress does that for anything it doesn't think as code (inb4 ”good programmers don't paste commands from wordpress to GNU+bash“).

One of the first things that I do when I set up a new Mac computer is to turn that damn "" -> “” ““““feature”””” off.

hunter2_8y ago

You forgot to mention leading apostrophes getting autocorrected to left_single_quote instead of right_single_quote, as in

John Doe ‘42

An abomination ‘cause it's a damn apostrophe, not opening a quotation.

peterburkimsher8y ago

For anyone wondering, this is how to turn that feature off.

System Preferences -> Keyboard -> Text -> Use smart quotes and dashes (uncheck).

athenot8y ago

On a US-English keyboard on the mac, you can always type the proper glyphs directly on the keyboard. That's always been harder on windows where one would have to type some numerical character code each time they wanted some non-ascii character. That's where the auto-replace originated.

On the mac:

    Option [          “      (English open quote)
    Option Shift [    ”      (English close quote)

A lesser-known feature is that some other quotations are also possible from that same US-English keyboard:

    Option \          «      (French open quote)
    Option Shift \    »      (French close quote)
    Option Shift W    „      (German open quote)

hunter2_8y ago

> at-the-beginning-of-the-file

That thing's the BOM.

jiggunjer8y ago

Which you don't really need with UTF-8, it only has a purpose for UTF-16+.

1 more reply

kazinator8y ago· 5 in thread

> Please do not use the ASCII grave accent (0x60) as a left quotation mark together with the ASCII apostrophe (0x27) as the corresponding right quotation mark (as in `quote').

Tell that to GCC:

  /usr/lib/gcc/i686-linux-gnu/4.6/../../../i386-linux-gnu/crt1.o: In function `_start':
  (.text+0x18): undefined reference to `main'

Looks good to me, by the way.

> Where ``quoting like this'' comes from

I did it for a while out of a habit acquired from working with TeX. In TeX, it is the source code syntax for encoding quotes. Of course, it is lexically analyzed and converted to proper typesetting.

> If you can use only ASCII’s typewriter characters, then use the apostrophe character (0x27) as both the left and right quotation mark (as in 'quote').

It looks like shit in any font in which the apostrophe is a little nine, which is historically correct. What you want is a little "six" on one side and a "nine" on the other, or at least some approximation thereof. Even if the apostrophe is crappily rendered as a little vertical notch, it still pairs with a backwards-slanted `.

(The representation of apostrophe as a little vertical notch, I suspect, caters to literals in programming languages.)

> If you can use Unicode characters ...

then you should still stick to ASCII unless you have other good reasons to. ``Can'' is not the same thing as ``should'', let alone ``must''.

> For example, 0x60 and 0x27 look under Windows NT 4.0 with the TrueType font Lucida Console (size 14) like this:

The idea that people should change their behavior because of which font is default on the Windows cmd.exe console is laughable.

Freak_NL8y ago

> then you should still stick to ASCII unless you have other good reasons to.

Why? Using non-ASCII Unicode characters acts like a nice canary for detecting character encoding issues. Besides, why would I purposely limit my text to ASCII? It doesn't even suffice for English, let alone almost any other language I use — including my native language Dutch, German, and Japanese.

kazinator8y ago

All sorts of reasons. Diagnostic printf message in some embedded firmware. Do you need to drag Unicode into it? Git log message. Ditto.

2 more replies

Zarel8y ago

> It looks like shit in any font in which the apostrophe is a little nine, which is historically correct. What you want is a little "six" on one side and a "nine" on the other, or at least some approximation thereof. Even if the apostrophe is crappily rendered as a little vertical notch, it still pairs with a backwards-slanted `.

> (The representation of apostrophe as a little vertical notch, I suspect, caters to literals in programming languages.)

"Historically", U+0027 has been used as all of an opening quote, a closing quote, an apostrophe, a prime symbol, an ʻokina, a modifier, etc.

So the historically correct thing is render it as a vertical notch so it looks non-horrible in all these uses, and render U+2018 and U+2019 as the "little nine" and "little six" symbols.

You don't need to speculate what the representation caters to; the Unicode spec actually does explain this (see Unicode 9.0 Chapter 6 Section 2)...

> The idea that people should change their behavior because of which font is default on the Windows cmd.exe console is laughable.

So your alternative is to change behavior because of which font is default on a system from 1984 which no one uses anymore?

kazinator8y ago

Open any random book in the English language printed in the last 200 years.

All the apostrophes look like a little nine: in contractions like it's, and the possessive 's.

That's the character that was included in the American Standard Code for Information Interchange.

Image: http://www.worldpowersystems.com/J/codes/X3.4-1963/page5.JPG

The glyph appearing in the standard looks like a little 9. It is denoted as "APOS" in parentheses. A reference to it is made in A6.8, calling it "apostrophe".

Wikipedia's (https://en.wikipedia.org/wiki/Apostrophe) page refers to a vertical notch glyph as a "typewriter apostrophe". The normal non-typewriter apostrophe looks like a comma.

Okina? That indicates a glottal stop in some languages none of which are English, and so which were understandably not represented in the American Standard Code.

1 more reply

tedunangst8y ago

gcc will output fancy Unicode quotes if you set locale. This of course even more fun if LANG is set incorrectly and you still have an 8 bit xterm; then the entire quoted string just disappears!

garou8y ago· 4 in thread

It's very odd for me to see the grave accent (`) as quoting mark in bash and other programming languages. I understand that the accent alone lose its function for the human language. But still uncomfortable to se an accent as delimiter to a string.

unkown-unknowns8y ago

Well, we also use the dollar sign to signify a variable in bash and PHP even though we aren't talking about an amount of USD. Likewise we use single and doublequote to mean special things.

HTML tags have nothing to do with less than or greater than, yet here we are.

In conclusion, it's simply convenient to use the standard keys we have and to use the symbols that are on it to mean something different from their original meaning in order to be able to express ourselves succinctly so that we don't have to spend so much time typing as we'd otherwise have to.

Of course you could always buy yourself an APL keyboard and write your programs in APL and use an APL REPL as your command line instead of using bash ;)

sp3328y ago

I'm not even sure why ASCII has a grave accent. There are no combining marks so you could never write it over another letter.

Edit: I forgot HTAB was actually part of ASCII. Oh well!

electroly8y ago

On a teletype, ALL characters are combining marks because you can backspace (another ASCII character derived directly from teletype codes) and type another character overtop it.

2 more replies

saint_fiasco8y ago

In ASCII instead of combining marks what you have to do is write three characters:

* The unaccented letter

* The backspace character

* The accent character

If this makes no sense to you, try to imagine a literal, physical typewriter. Windows line terminators also work with a similar principle.

1 more reply

mxfh8y ago· 3 in thread

to add to the confusion:

' PRIME (U+2032)

" DOUBLE PRIME aka inch mark (U+2033)

have their own codepoints

http://practicaltypography.com/foot-and-inch-marks.html

which describes implications for typesetting coordinates and other things:

118° 19′ 43.5″

118° 19’ 43.5” wrong (curly quotes, although it renders identical in some fonts)

118° 19' 43.5" right

timb078y ago

And further confusion:

An ʻokina (U+02BB, as found in "Hawaiʻi") is neither an apostrophe nor a left quotation mark.

https://en.wikipedia.org/wiki/%CA%BBOkina

mark-r8y ago

Those should be added to the document, with a note that they are NOT quotes!

mxfh8y ago

I doubt this will ever be updated, since it's reference for a very specific, 20 year old, code interpretation related proposal und not meant for type setting.

But for anything related to contemporary typesetting on the web I recommend Practical Typography and especially the Type Composition chapter:

http://practicaltypography.com/type-composition.html#links

Including notes on quotes and apostrophes:

http://practicaltypography.com/straight-and-curly-quotes.htm...

http://practicaltypography.com/apostrophes.html

1 more reply

darkengine8y ago· 2 in thread

MS PGothic, a very common font in Japan, still uses this type of quote. "Quoting like this'' (double quote, then two single quotes) looks the most natural in this font. "Using two double quotes" looks quite odd (see screenshot) [1]

If you've ever seen an English-language page on a Japanese website that used weird quotes, this is probably why.

[1] https://i.imgur.com/zcuFZa1.png

boondaburrah8y ago

Ah, the old dead giveaway "this game was translated from japanese and we CBA to handle localisation properly" fonts.

zeratax8y ago

So I just saw parentheses like this "(）" on japanese twitter

it starts with a regular parenthesis "(" but ends with a fullwidth parenthesis "）"(U+FF09)

is this a similar thing?

ttepasse8y ago· 2 in thread

The usage of of an accent as syntax in markup and programming languages annoys me to no end. And it will still be used, to this day, the latest example are template string in Javascript.

• It is semantically idiotic because it's an accent, not a character.

• It is visually annoying because you almost can't see the thing.

• It is bad for usability, because on non-US keyboards the accents are implemented as dead keys. Yes, accent + space gives you the character but that's really unintuitive for people who grew up expecting accents only over letters.

evincarofautumn8y ago

Same, I’ve never cared for it. For these reasons I’ve decided to take a stand and avoid using the grave accent for anything in a programming language I’m working on. Same goes for the dollar sign, because it’s somewhat Americentric, and as a currency character it doesn’t have any great semantic or mnemonic value except for, well, currency units. I guess you could argue for $trings (BASIC) or $calars (Perl) if you have $igils, but I don’t.

Sacrificing these bits of ASCII is fine by me, because the language is small enough, and I also allow Unicode. For example, curved quotes are allowed and can be nested or contain ASCII quotes without escaping:

    // Character literals
    ‘'’
    =
    '\''

    // Text literals
    “Some "text" with “curved quotes”.”
    =
    "Some \"text\" with “curved quotes”."

For the sake of usability, of course, everything in the core language & standard library has an ASCII spelling, like in Perl6. I’d like for other languages to adopt this view as well. If new languages allow proper Unicode notation in some sensible places, then programming editors’ input methods will catch up, e.g., automatically replacing “->” with “→” or “\theta” with “θ” (like Emacs’ TeX input mode).

Also, does anyone know of a reference for keyboard layouts from around the world that includes estimates of the number of people using them? I’ve tried to keep things relatively easy to type on all the major layouts I know of, but I don’t want to alienate anyone if I can help it.

int_19h8y ago

> Same goes for the dollar sign, because it’s somewhat Americentric, and as a currency character it doesn’t have any great semantic or mnemonic value except for, well, currency units.

By that metric, wouldn't & be too Anglo-centric, and # be too Euro-centric? There are layouts out there on which neither is readily available.

2 more replies

ratmice8y ago· 2 in thread

FYI For a long time GNU coding standards prescribed using the grave accent, but this changed some years ago now

https://www.gnu.org/prep/standards/html_node/Quote-Character...

josteink8y ago

From the link:

> Although GNU programs traditionally used 0x60 (‘`’) for opening and 0x27 (‘'’) for closing quotes, nowadays quotes ‘`like this'’ are typically rendered asymmetrically, so quoting ‘"like this"’ or ‘'like this'’ typically looks better.

Is this link saying I can quit using `QUOTES' in my Emacs-documentation? That style always struck me as odd :)

lottin8y ago

Curved single quotes (‘...’) are recommended now:

https://www.gnu.org/software/emacs/manual/html_node/elisp/Do...

13of408y ago· 2 in thread

CSB: Years ago I was working on a team that developed a scripting language and we had this recurring problem where someone would write up a code sample in a Word document and it would break if you cut and pasted it because all of the single and double quotes would be Unicode. My boss was this tough guy who tried to snap the whole team to a standard of strictly disabling that behavior in all of our Office applications, but I piped up and said maybe we should just make the language treat all of those characters like apostrophes and quotes.... I think around version 5 they finally made an API for doing proper anti-injection escaping because you pretty much needed a PhD to get it right due to all of the variations introduced by the extended characters.

delinka8y ago

Or ... use a text editor?

13of408y ago

You know a lot of PMs who write specs in notepad?

tempodox8y ago· 2 in thread

I would fain use the curly quotes if only Darwin's groff(1) wouldn't barf on them. For the time being, man pages for one still need to quote like ``this''.

anjbe8y ago

In troff you can escape “ ” ‘ ’ as \(lq, \(rq, \(oq, \(cq respectively.

If you’re writing manpages, though, you should be using the -mdoc macros (https://manpages.bsd.lv/mdoc.html), which have “Dq” and “Sq” macros that wrap the arguments in double and single quotes respectively.

robin_reala8y ago

brew install groff? That’ll get you 1.22.3 instead of the default 1.19.2.

gumby8y ago· 2 in thread

I find it interesting that the article includes a German keyboard that doesn't include the proper ,,'' (or ,') quotation glyphs. However it does include grave and acute accents as well as French primary quotations (<< and >>) though not the secondary guillemots (quotation characters < and >) none of which are used in German text.

And of course I used ascii analogues to type these into HN :-(

YSFEJ4SWJUVU68y ago

>And of course I used ascii analogues to type these into HN :-(

But why, though? To the best of my knowledge, HN supports unicode quite well, including the following quotes: »«›‹„“‚‘ (available with the help of AltGr and sometimes shift from keys y, x, v, b when selecting the German keyboard layout on my computer).

gumby8y ago

I'm using a travel laptop on a plane and it came with a US keyboard

dmitriid8y ago· 2 in thread

It's worse for other languages. Russian quotation marks are « and ». Thanks to early computers being predominantly from/designed in the US, they are now highjacked by American quotes.

Same probably goes for French and other languages with their own sets of quotation marks.

ansgri8y ago

«Russian» quotation marks are actually the « French » ones with different spacing. There's another, less used set of quotes in Russian, so called „German“ ones (used as inner quotes and in handwriting). English quotes are widely accepted though.

contingencies8y ago

Modern Chinese usage includes all of 《》〈〉「」『』【】“” and probably others, roughly in that order. Modern typographic convention is perhaps 《title》「quote」 but that's surely opine and debatable. Hong Kong and Taiwan have their own typesetting conventions, distinct from mainland China, and in the latter case no doubt influenced by Japanese occupation and cultural inflow (manga, etc.). Historically for most of Chinese history written language had no punctuation, and sentence endings were merely inferred from context, which was historically clearer 也. See https://en.wikipedia.org/wiki/Chinese_punctuation and https://en.wiktionary.org/wiki/%E4%B9%9F#Definitions (definition #4)

jiggunjer8y ago· 1 in thread

What bothers me about Unicode isn't that apostrophe (U+0027) is overloaded by having two semantic meanings ("apostrophe" or "single straight quote"), but that they exacerbate the confusion by recommending to overload "right single quote" (U+2019) to also mean apostrophe.

We now have two characters for apostrophe and extra ambiguity for processing correct right single quotes. Great job not breaking historical documents Unicode.

Loic8y ago

And now, imagine that your own name has an apostrophe in it. Like my family name. I can tell you, I crashed many databases and in 90% of the cases where people need to find again my name in a database, it is ending up with requesting my address because each time a different character is put by the clerk doing the data entry and they cannot match my name. Even state level authorities are bad, really bad, at it.

Tepix8y ago· 1 in thread

It seems that half of the people in this company use the wrong acute sign `as an apostrophe instead of ' or ’. Unfortunately it's the half that creates presentations and talks to customers.

It looks terrible and to me it's a disgrace!

Example: it`s versus it's or it’s. (first one is wrong).

ygra8y ago

I've seen a café which had its name written in large, lit letters on the façade and it included the following gem: Cafe`. Yes, the wrong accent, and not even combined. Easy access to DTP tools (or even a word processor) for the typographically uneducated masses ends up with quite painful results sometimes.

sengork8y ago

Things become really fun when you're trying to figure out why that command fails when you've copy/pasted it from another application window.

Often it's the quotes which have been silently (automatically) converted to a visually similar (but functionally incompatible) character variant.

mirimir8y ago

In another life, I analyzed enterprise data. Variation in quotation marks was a common problem. I mean, is it "D'arcy" or "D’arcy"? Sometimes, I think, people would mangle data in spreadsheets, with auto-correct on.

alanh8y ago

While I can’t expect many to follow suit, I myself often type educated quotes and nice apostrophes. The macOS keyboard combinations (nearly-intuitive combinations of Option-(Shift)-[ and -] for “”‘’) have long been committed to muscle memory. And since nearly all (web) file formats seem to be UTF-8, the days of manually typing “ and friends are long, long gone.

Benefits of typing and using typographer’s quotes directly in your JS/JSON/HTML/source:

1. No backslashes or other escape sequences needed!

2. WYSIWYG

3. Retina screens and gorgeous modern fonts mean that your sloppy quotes will look extra bad if you just use ASCII quotes

rdtsc8y ago

> The Unix m4 macro processor is probably the only widely used tool that uses the `quote' combination as part of its input syntax; however, even that could be modified via changequote.

I remember staring for a long time at the file when I first saw an m4 macro. My brain was telling, surely this has go to be a typo, but then everything worked as expected. Then I learned that's a proper way of quoting there.

timb078y ago

It's a little bit off-topic since the article was primarily about quotation marks and coding, but it would have been good if it mentioned that an ʻokina (as found in "Hawaiʻi") is neither an apostrophe nor a left quotation mark.

https://en.wikipedia.org/wiki/%CA%BBOkina

jiggunjer8y ago

So why isn't there a straight single quotation, but there is a straight double quotation? I get it probably arose from compatibility reasons, but nowadays Unicode should be able to offer something?

P.S. Major coincide I was googling this very question yesterday?

exikyut8y ago

For reference, the BIOS text-mode font included with some IBM PCs (I've observed this on NetVistas and ThinkPads myself, at least) renders ` as a nice-looking opening quote, and ' looks like a nice closing quote.

audiodude8y ago

Honestly I've been seeing `quote' in bash and other CLIs for my entire career and always thought they were just funny or strange, but carried no meaning.

jrochkind18y ago

MRI ruby still does this in some error messages. I hate it. Always messing up my copy-and-paste into `` markdown too.

crottypeter8y ago

Could do with a (2007) suffix.

j / k navigate · click thread line to collapse

186 comments

88 comments · 26 top-level

lisper8y ago· 15 in thread

"This is string containing an embedded \"quoted\" string"

Then I have to think about whether or not the system I'm going to send that string to is going to "helpfully" remove the backslashes, in which case I need to write:

"This is a string containing an embedded \\"quoted\\" string"

God help you if you want to go two levels deep.

All this horrible complexity could have been avoided if we could just write:

«This is a string containing an «embedded» quoted string»

Alas.

eesmith8y ago

The complexity might be minimized, but not avoided. You would still need an escape mechanism for something like «She said «The \» key on the server doesn't work.»»

ASCII did add <>, [], and {}, any of which could have been used for quoted strings, had the programming language designers chosen that option.

https://en.wikipedia.org/wiki/String_literal#Paired_delimite... points out that PostScript and Tcl have a string literal which allows matched quotes.

  PostScript: (The quick (brown fox))
  Tcl: {The quick {brown fox}}

stormbrew8y ago

Ruby lets you use arbitrary tokens for string literals with %s{} (where the braces can be a bunch of things). I wish more languages would adopt this tbh.

4 more replies

lisper8y ago

> You would still need an escape mechanism for something like...

Yes, but that's a pretty rare case, much more so than embedded strings.

Even that case could be solved by having two different quotes, like Python which allows both 'string' and "string". So you could do:

«This is a string that mentions the ” character without escaping it»

“This is a string that mentions the « character without escaping it”

Yes, there are still some edge cases, like embedding both “ and « in the same string. But that's really rare.

1 more reply

xelxebar8y ago

> You would still need an escape mechanism for ...

I think this is actually desirable, since in your case the escape denotes different semantics. The unescaped pairs act like quotation operators while the escaped version is a character literal.

dragonwriter8y ago

Also Ruby:

  %q{This is a string with an %q{embedded quote}.}

1 more reply

zaxomi8y ago

Actually, ASCII have mechanism for solving the problem that you describe, with control codes FS, GS, RS and US.

jiggunjer8y ago

I disagree. Sure it might regex better, but my typing speed and typo rate would be much worse if I had to type separate open and close quotes for all my strings.

coldtea8y ago

>All this horrible complexity could have been avoided if we could just write

Only if there was no chance of unbalanced quotes to need to be in the string.

mort968y ago

You'd still have escape sequences for those cases.

tome8y ago

If ASCII had balanced quotes then they would be used by programming languages to delimit strings and we would be back to square one with regards to escaping them!

hk__28y ago

You don’t need escaping in «This is a string containing an «embedded» quoted string».

4 more replies

vvanders8y ago

Yeah, seen a ton of tools that auto-format to left/right quote automatically but then output ASCII and mangle the conversion.

agumonkey8y ago

let's rewrite social idioms to use < > as quotes.

Symbiote8y ago

«These characters» are the usual way of quoting in several languages.

See https://en.wikipedia.org/wiki/Guillemet

1 more reply

cgtyoder8y ago

> The fact that ASCII does not have balanced quotes is one of the great catastrophes of computing.

Okay.

peapicker8y ago· 8 in thread

I'm pretty sure text like:

  ``quoted''

Is how you're supposed to write short quotes in the TeX/LaTeX typesetting system.

At least, that's how I remember it...]

leephillips8y ago

ams61108y ago

Yeah but that gets rendered as the proper quote glyph in the final document.

gbacon8y ago

Another giveaway of a TeX-savvy writer out of water is when you see --- for em-dash, i.e., ‘—’.

gbacon8y ago

Does HN markdown understand — or ‘?

EDIT: Nope.

2 more replies

fish_fan8y ago

emmelaich8y ago

Yeah, I think the motivation for `' is for markup too. I'm pretty sure they've been recommended in GNU info and groff for that reason.

ISL8y ago

It is, and denotes opening and closing quotes.

dheera8y ago

I always hated this horribly inconsistency.

    \left( \right)
    \left{ \right}
    `` ''

Why not

    \left" \right"
    \left' \right'

Better yet, make it completely DRY:

    \( \)
    \{ \}
    \`` \''
    \` \'

treve8y ago· 6 in thread

mbrock8y ago

The only languages I know off the top of my head that use balanced delimiters for strings are M4 and Perl 6.

Hey, imagine being able to nest strings without escaping! What a concept!

chaosfox8y ago

Perl does that as well, and you can even choose the delimiters you wanna use:

kazinator8y ago

PostScript! It uses (...) for strings.

stevelosh8y ago

Common Lisp doesn't have it built in, but the cl-interpol library adds this (and you can add your own custom delimiters too).

http://weitz.de/cl-interpol/#syntax

jstimpfle8y ago

Not a language, but I adopted this concept as well: http://jstimpfle.de/projects/wsl/main.html

And I guess you could count HTML in, too.

psadauskas8y ago

And every time I get in an argument with a poorly-escaped CSV file, I wish we had just used ASCII 28-31 as delimiters. (File, Group, Record and Unit Separator)

jimmies8y ago· 5 in thread

I hate the "" -> “” thing with a passion. I don't know how much productivity that the world has lost with that “” shit.

One of the first things that I do when I set up a new Mac computer is to turn that damn "" -> “” ““““feature”””” off.

hunter2_8y ago

You forgot to mention leading apostrophes getting autocorrected to left_single_quote instead of right_single_quote, as in

John Doe ‘42

An abomination ‘cause it's a damn apostrophe, not opening a quotation.

peterburkimsher8y ago

For anyone wondering, this is how to turn that feature off.

System Preferences -> Keyboard -> Text -> Use smart quotes and dashes (uncheck).

athenot8y ago

On the mac:

    Option [          “      (English open quote)
    Option Shift [    ”      (English close quote)

A lesser-known feature is that some other quotations are also possible from that same US-English keyboard:

    Option \          «      (French open quote)
    Option Shift \    »      (French close quote)
    Option Shift W    „      (German open quote)

hunter2_8y ago

> at-the-beginning-of-the-file

That thing's the BOM.

jiggunjer8y ago

Which you don't really need with UTF-8, it only has a purpose for UTF-16+.

1 more reply

kazinator8y ago· 5 in thread

> Please do not use the ASCII grave accent (0x60) as a left quotation mark together with the ASCII apostrophe (0x27) as the corresponding right quotation mark (as in `quote').

Tell that to GCC:

  /usr/lib/gcc/i686-linux-gnu/4.6/../../../i386-linux-gnu/crt1.o: In function `_start':
  (.text+0x18): undefined reference to `main'

Looks good to me, by the way.

> Where ``quoting like this'' comes from

I did it for a while out of a habit acquired from working with TeX. In TeX, it is the source code syntax for encoding quotes. Of course, it is lexically analyzed and converted to proper typesetting.

> If you can use only ASCII’s typewriter characters, then use the apostrophe character (0x27) as both the left and right quotation mark (as in 'quote').

(The representation of apostrophe as a little vertical notch, I suspect, caters to literals in programming languages.)

> If you can use Unicode characters ...

then you should still stick to ASCII unless you have other good reasons to. ``Can'' is not the same thing as ``should'', let alone ``must''.

> For example, 0x60 and 0x27 look under Windows NT 4.0 with the TrueType font Lucida Console (size 14) like this:

The idea that people should change their behavior because of which font is default on the Windows cmd.exe console is laughable.

Freak_NL8y ago

> then you should still stick to ASCII unless you have other good reasons to.

kazinator8y ago

All sorts of reasons. Diagnostic printf message in some embedded firmware. Do you need to drag Unicode into it? Git log message. Ditto.

2 more replies

Zarel8y ago

> (The representation of apostrophe as a little vertical notch, I suspect, caters to literals in programming languages.)

"Historically", U+0027 has been used as all of an opening quote, a closing quote, an apostrophe, a prime symbol, an ʻokina, a modifier, etc.

So the historically correct thing is render it as a vertical notch so it looks non-horrible in all these uses, and render U+2018 and U+2019 as the "little nine" and "little six" symbols.

You don't need to speculate what the representation caters to; the Unicode spec actually does explain this (see Unicode 9.0 Chapter 6 Section 2)...

> The idea that people should change their behavior because of which font is default on the Windows cmd.exe console is laughable.

So your alternative is to change behavior because of which font is default on a system from 1984 which no one uses anymore?

kazinator8y ago

Open any random book in the English language printed in the last 200 years.

All the apostrophes look like a little nine: in contractions like it's, and the possessive 's.

That's the character that was included in the American Standard Code for Information Interchange.

Image: http://www.worldpowersystems.com/J/codes/X3.4-1963/page5.JPG

The glyph appearing in the standard looks like a little 9. It is denoted as "APOS" in parentheses. A reference to it is made in A6.8, calling it "apostrophe".

Wikipedia's (https://en.wikipedia.org/wiki/Apostrophe) page refers to a vertical notch glyph as a "typewriter apostrophe". The normal non-typewriter apostrophe looks like a comma.

Okina? That indicates a glottal stop in some languages none of which are English, and so which were understandably not represented in the American Standard Code.

1 more reply

tedunangst8y ago

gcc will output fancy Unicode quotes if you set locale. This of course even more fun if LANG is set incorrectly and you still have an 8 bit xterm; then the entire quoted string just disappears!

garou8y ago· 4 in thread

unkown-unknowns8y ago

Well, we also use the dollar sign to signify a variable in bash and PHP even though we aren't talking about an amount of USD. Likewise we use single and doublequote to mean special things.

HTML tags have nothing to do with less than or greater than, yet here we are.

Of course you could always buy yourself an APL keyboard and write your programs in APL and use an APL REPL as your command line instead of using bash ;)

sp3328y ago

I'm not even sure why ASCII has a grave accent. There are no combining marks so you could never write it over another letter.

Edit: I forgot HTAB was actually part of ASCII. Oh well!

electroly8y ago

On a teletype, ALL characters are combining marks because you can backspace (another ASCII character derived directly from teletype codes) and type another character overtop it.

2 more replies

saint_fiasco8y ago

In ASCII instead of combining marks what you have to do is write three characters:

* The unaccented letter

* The backspace character

* The accent character

If this makes no sense to you, try to imagine a literal, physical typewriter. Windows line terminators also work with a similar principle.

1 more reply

mxfh8y ago· 3 in thread

to add to the confusion:

' PRIME (U+2032)

" DOUBLE PRIME aka inch mark (U+2033)

have their own codepoints

http://practicaltypography.com/foot-and-inch-marks.html

which describes implications for typesetting coordinates and other things:

118° 19′ 43.5″

118° 19’ 43.5” wrong (curly quotes, although it renders identical in some fonts)

118° 19' 43.5" right

timb078y ago

And further confusion:

An ʻokina (U+02BB, as found in "Hawaiʻi") is neither an apostrophe nor a left quotation mark.

https://en.wikipedia.org/wiki/%CA%BBOkina

mark-r8y ago

Those should be added to the document, with a note that they are NOT quotes!

mxfh8y ago

I doubt this will ever be updated, since it's reference for a very specific, 20 year old, code interpretation related proposal und not meant for type setting.

But for anything related to contemporary typesetting on the web I recommend Practical Typography and especially the Type Composition chapter:

http://practicaltypography.com/type-composition.html#links

Including notes on quotes and apostrophes:

http://practicaltypography.com/straight-and-curly-quotes.htm...

http://practicaltypography.com/apostrophes.html

1 more reply

darkengine8y ago· 2 in thread

If you've ever seen an English-language page on a Japanese website that used weird quotes, this is probably why.

[1] https://i.imgur.com/zcuFZa1.png

boondaburrah8y ago

Ah, the old dead giveaway "this game was translated from japanese and we CBA to handle localisation properly" fonts.

zeratax8y ago

So I just saw parentheses like this "(）" on japanese twitter

it starts with a regular parenthesis "(" but ends with a fullwidth parenthesis "）"(U+FF09)

is this a similar thing?

ttepasse8y ago· 2 in thread

The usage of of an accent as syntax in markup and programming languages annoys me to no end. And it will still be used, to this day, the latest example are template string in Javascript.

• It is semantically idiotic because it's an accent, not a character.

• It is visually annoying because you almost can't see the thing.

evincarofautumn8y ago

    // Character literals
    ‘'’
    =
    '\''

    // Text literals
    “Some "text" with “curved quotes”.”
    =
    "Some \"text\" with “curved quotes”."

int_19h8y ago

> Same goes for the dollar sign, because it’s somewhat Americentric, and as a currency character it doesn’t have any great semantic or mnemonic value except for, well, currency units.

By that metric, wouldn't & be too Anglo-centric, and # be too Euro-centric? There are layouts out there on which neither is readily available.

2 more replies

ratmice8y ago· 2 in thread

FYI For a long time GNU coding standards prescribed using the grave accent, but this changed some years ago now

https://www.gnu.org/prep/standards/html_node/Quote-Character...

josteink8y ago

From the link:

Is this link saying I can quit using `QUOTES' in my Emacs-documentation? That style always struck me as odd :)

lottin8y ago

Curved single quotes (‘...’) are recommended now:

https://www.gnu.org/software/emacs/manual/html_node/elisp/Do...

13of408y ago· 2 in thread

delinka8y ago

Or ... use a text editor?

13of408y ago

You know a lot of PMs who write specs in notepad?

tempodox8y ago· 2 in thread

I would fain use the curly quotes if only Darwin's groff(1) wouldn't barf on them. For the time being, man pages for one still need to quote like ``this''.

anjbe8y ago

In troff you can escape “ ” ‘ ’ as \(lq, \(rq, \(oq, \(cq respectively.

robin_reala8y ago

brew install groff? That’ll get you 1.22.3 instead of the default 1.19.2.

gumby8y ago· 2 in thread

And of course I used ascii analogues to type these into HN :-(

YSFEJ4SWJUVU68y ago

>And of course I used ascii analogues to type these into HN :-(

gumby8y ago

I'm using a travel laptop on a plane and it came with a US keyboard

dmitriid8y ago· 2 in thread

It's worse for other languages. Russian quotation marks are « and ». Thanks to early computers being predominantly from/designed in the US, they are now highjacked by American quotes.

Same probably goes for French and other languages with their own sets of quotation marks.

ansgri8y ago

contingencies8y ago

jiggunjer8y ago· 1 in thread

We now have two characters for apostrophe and extra ambiguity for processing correct right single quotes. Great job not breaking historical documents Unicode.

Loic8y ago

Tepix8y ago· 1 in thread

It seems that half of the people in this company use the wrong acute sign `as an apostrophe instead of ' or ’. Unfortunately it's the half that creates presentations and talks to customers.

It looks terrible and to me it's a disgrace!

Example: it`s versus it's or it’s. (first one is wrong).

ygra8y ago

sengork8y ago

Things become really fun when you're trying to figure out why that command fails when you've copy/pasted it from another application window.

Often it's the quotes which have been silently (automatically) converted to a visually similar (but functionally incompatible) character variant.

mirimir8y ago

alanh8y ago

Benefits of typing and using typographer’s quotes directly in your JS/JSON/HTML/source:

1. No backslashes or other escape sequences needed!

2. WYSIWYG

3. Retina screens and gorgeous modern fonts mean that your sloppy quotes will look extra bad if you just use ASCII quotes

rdtsc8y ago

> The Unix m4 macro processor is probably the only widely used tool that uses the `quote' combination as part of its input syntax; however, even that could be modified via changequote.

timb078y ago

https://en.wikipedia.org/wiki/%CA%BBOkina

jiggunjer8y ago

So why isn't there a straight single quotation, but there is a straight double quotation? I get it probably arose from compatibility reasons, but nowadays Unicode should be able to offer something?

P.S. Major coincide I was googling this very question yesterday?

exikyut8y ago

audiodude8y ago

Honestly I've been seeing `quote' in bash and other CLIs for my entire career and always thought they were just funny or strange, but carried no meaning.

jrochkind18y ago

MRI ruby still does this in some error messages. I hate it. Always messing up my copy-and-paste into `` markdown too.

crottypeter8y ago

Could do with a (2007) suffix.

j / k navigate · click thread line to collapse