"This is string containing an embedded \"quoted\" string"
Then I have to think about whether or not the system I'm going to send that string to is going to "helpfully" remove the backslashes, in which case I need to write:
"This is a string containing an embedded \\"quoted\\" string"
God help you if you want to go two levels deep.
All this horrible complexity could have been avoided if we could just write:
«This is a string containing an «embedded» quoted string»
Alas.
ASCII did add <>, [], and {}, any of which could have been used for quoted strings, had the programming language designers chosen that option.
https://en.wikipedia.org/wiki/String_literal#Paired_delimite... points out that PostScript and Tcl have a string literal which allows matched quotes.
PostScript: (The quick (brown fox))
Tcl: {The quick {brown fox}}Yes, but that's a pretty rare case, much more so than embedded strings.
Even that case could be solved by having two different quotes, like Python which allows both 'string' and "string". So you could do:
«This is a string that mentions the ” character without escaping it»
“This is a string that mentions the « character without escaping it”
Yes, there are still some edge cases, like embedding both “ and « in the same string. But that's really rare.
I think this is actually desirable, since in your case the escape denotes different semantics. The unescaped pairs act like quotation operators while the escaped version is a character literal.
Only if there was no chance of unbalanced quotes to need to be in the string.
Okay.
``quoted''
Is how you're supposed to write short quotes in the TeX/LaTeX typesetting system.[edit: My point being that the author seems to think this type of quoting originated with X11... which is actually newer than TeX (X11 was first released in 1984), and that the prevalence of this type of quoting likely originated with TeX when it was released in 1978... which isn't mentioned at all in the article. In fact, since TeX/LaTeX is what all the CS, Physics, and Math types were using for journal articles, it is likely the X11 font bitmap glyphs were intentionally shaped like curly quotes to make editing your TeX source files prettier.
At least, that's how I remember it...]
Hey, imagine being able to nest strings without escaping! What a concept!
> For the constructs except here-docs, single characters are used as starting and ending delimiters. If the starting delimiter is an opening punctuation (that is (, [, {, or < ), the ending delimiter is the corresponding closing punctuation (that is ), ], }, or >). If the starting delimiter is an unpaired character like / or a closing punctuation, the ending delimiter is the same as the starting delimiter. Therefore a / terminates a qq// construct, while a ] terminates both qq[] and qq]] constructs.
Nesting string literals without escaping is a somewhat poor concept, though. Firstly, what does that even mean? Given `abc `def' ghi', what is the string here? Is it abc def ghi or is it abc `def' ghi? Secondly, what if I want to just have an unbalanced ` character in the string data?
And I guess you could count HTML in, too.
It doesn't look that much better, and it always fucks with me at random times. That shit is on the list of annoying problems that shouldn't exist in the first place, along with the \nl\cr thing, and the txt saved as rtf thing, and the UTF-8 encoding-character-at-the-beginning-of-the-file or whatever it is called.
Someone complains your program gives them an error when they open a csv file you sent them. You tested your program, it works. You go on the phone with them for 30 minutes, try to figure out what the fuck was going on. There it is, it was opened in a program that meddles with that "" and replaces it with the “” shit.
Also, there has to be at least one time you're fucked by the "" -> “” snobbiness when you go to a random Wordpress site and paste the command they tell you to do to the command line and realize it doesn't work. You pull your hair for a couple of minutes, and there is that sneaky ” thing. Wordpress does that for anything it doesn't think as code (inb4 ”good programmers don't paste commands from wordpress to GNU+bash“).
One of the first things that I do when I set up a new Mac computer is to turn that damn "" -> “” ““““feature”””” off.
John Doe ‘42
An abomination ‘cause it's a damn apostrophe, not opening a quotation.
System Preferences -> Keyboard -> Text -> Use smart quotes and dashes (uncheck).
On the mac:
Option [ “ (English open quote)
Option Shift [ ” (English close quote)
A lesser-known feature is that some other quotations are also possible from that same US-English keyboard: Option \ « (French open quote)
Option Shift \ » (French close quote)
Option Shift W „ (German open quote)Tell that to GCC:
/usr/lib/gcc/i686-linux-gnu/4.6/../../../i386-linux-gnu/crt1.o: In function `_start':
(.text+0x18): undefined reference to `main'
Looks good to me, by the way.> Where ``quoting like this'' comes from
I did it for a while out of a habit acquired from working with TeX. In TeX, it is the source code syntax for encoding quotes. Of course, it is lexically analyzed and converted to proper typesetting.
> If you can use only ASCII’s typewriter characters, then use the apostrophe character (0x27) as both the left and right quotation mark (as in 'quote').
It looks like shit in any font in which the apostrophe is a little nine, which is historically correct. What you want is a little "six" on one side and a "nine" on the other, or at least some approximation thereof. Even if the apostrophe is crappily rendered as a little vertical notch, it still pairs with a backwards-slanted `.
(The representation of apostrophe as a little vertical notch, I suspect, caters to literals in programming languages.)
> If you can use Unicode characters ...
then you should still stick to ASCII unless you have other good reasons to. ``Can'' is not the same thing as ``should'', let alone ``must''.
> For example, 0x60 and 0x27 look under Windows NT 4.0 with the TrueType font Lucida Console (size 14) like this:
The idea that people should change their behavior because of which font is default on the Windows cmd.exe console is laughable.
Why? Using non-ASCII Unicode characters acts like a nice canary for detecting character encoding issues. Besides, why would I purposely limit my text to ASCII? It doesn't even suffice for English, let alone almost any other language I use — including my native language Dutch, German, and Japanese.
> (The representation of apostrophe as a little vertical notch, I suspect, caters to literals in programming languages.)
"Historically", U+0027 has been used as all of an opening quote, a closing quote, an apostrophe, a prime symbol, an ʻokina, a modifier, etc.
So the historically correct thing is render it as a vertical notch so it looks non-horrible in all these uses, and render U+2018 and U+2019 as the "little nine" and "little six" symbols.
You don't need to speculate what the representation caters to; the Unicode spec actually does explain this (see Unicode 9.0 Chapter 6 Section 2)...
> The idea that people should change their behavior because of which font is default on the Windows cmd.exe console is laughable.
So your alternative is to change behavior because of which font is default on a system from 1984 which no one uses anymore?
All the apostrophes look like a little nine: in contractions like it's, and the possessive 's.
That's the character that was included in the American Standard Code for Information Interchange.
Image: http://www.worldpowersystems.com/J/codes/X3.4-1963/page5.JPG
The glyph appearing in the standard looks like a little 9. It is denoted as "APOS" in parentheses. A reference to it is made in A6.8, calling it "apostrophe".
Wikipedia's (https://en.wikipedia.org/wiki/Apostrophe) page refers to a vertical notch glyph as a "typewriter apostrophe". The normal non-typewriter apostrophe looks like a comma.
Okina? That indicates a glottal stop in some languages none of which are English, and so which were understandably not represented in the American Standard Code.
HTML tags have nothing to do with less than or greater than, yet here we are.
In conclusion, it's simply convenient to use the standard keys we have and to use the symbols that are on it to mean something different from their original meaning in order to be able to express ourselves succinctly so that we don't have to spend so much time typing as we'd otherwise have to.
Of course you could always buy yourself an APL keyboard and write your programs in APL and use an APL REPL as your command line instead of using bash ;)
Edit: I forgot HTAB was actually part of ASCII. Oh well!
* The unaccented letter
* The backspace character
* The accent character
If this makes no sense to you, try to imagine a literal, physical typewriter. Windows line terminators also work with a similar principle.
' PRIME (U+2032)
" DOUBLE PRIME aka inch mark (U+2033)
have their own codepoints
http://practicaltypography.com/foot-and-inch-marks.html
which describes implications for typesetting coordinates and other things:
118° 19′ 43.5″
118° 19’ 43.5” wrong (curly quotes, although it renders identical in some fonts)
118° 19' 43.5" right
An ʻokina (U+02BB, as found in "Hawaiʻi") is neither an apostrophe nor a left quotation mark.
But for anything related to contemporary typesetting on the web I recommend Practical Typography and especially the Type Composition chapter:
http://practicaltypography.com/type-composition.html#links
Including notes on quotes and apostrophes:
http://practicaltypography.com/straight-and-curly-quotes.htm...
If you've ever seen an English-language page on a Japanese website that used weird quotes, this is probably why.
it starts with a regular parenthesis "(" but ends with a fullwidth parenthesis ")"(U+FF09)
is this a similar thing?
• It is semantically idiotic because it's an accent, not a character.
• It is visually annoying because you almost can't see the thing.
• It is bad for usability, because on non-US keyboards the accents are implemented as dead keys. Yes, accent + space gives you the character but that's really unintuitive for people who grew up expecting accents only over letters.
Sacrificing these bits of ASCII is fine by me, because the language is small enough, and I also allow Unicode. For example, curved quotes are allowed and can be nested or contain ASCII quotes without escaping:
// Character literals
‘'’
=
'\''
// Text literals
“Some "text" with “curved quotes”.”
=
"Some \"text\" with “curved quotes”."
For the sake of usability, of course, everything in the core language & standard library has an ASCII spelling, like in Perl6. I’d like for other languages to adopt this view as well. If new languages allow proper Unicode notation in some sensible places, then programming editors’ input methods will catch up, e.g., automatically replacing “->” with “→” or “\theta” with “θ” (like Emacs’ TeX input mode).Also, does anyone know of a reference for keyboard layouts from around the world that includes estimates of the number of people using them? I’ve tried to keep things relatively easy to type on all the major layouts I know of, but I don’t want to alienate anyone if I can help it.
By that metric, wouldn't & be too Anglo-centric, and # be too Euro-centric? There are layouts out there on which neither is readily available.
https://www.gnu.org/prep/standards/html_node/Quote-Character...
> Although GNU programs traditionally used 0x60 (‘`’) for opening and 0x27 (‘'’) for closing quotes, nowadays quotes ‘`like this'’ are typically rendered asymmetrically, so quoting ‘"like this"’ or ‘'like this'’ typically looks better.
Is this link saying I can quit using `QUOTES' in my Emacs-documentation? That style always struck me as odd :)
https://www.gnu.org/software/emacs/manual/html_node/elisp/Do...
If you’re writing manpages, though, you should be using the -mdoc macros (https://manpages.bsd.lv/mdoc.html), which have “Dq” and “Sq” macros that wrap the arguments in double and single quotes respectively.
And of course I used ascii analogues to type these into HN :-(
But why, though? To the best of my knowledge, HN supports unicode quite well, including the following quotes: »«›‹„“‚‘ (available with the help of AltGr and sometimes shift from keys y, x, v, b when selecting the German keyboard layout on my computer).
Same probably goes for French and other languages with their own sets of quotation marks.
We now have two characters for apostrophe and extra ambiguity for processing correct right single quotes. Great job not breaking historical documents Unicode.
It looks terrible and to me it's a disgrace!
Example: it`s versus it's or it’s. (first one is wrong).
Often it's the quotes which have been silently (automatically) converted to a visually similar (but functionally incompatible) character variant.
Benefits of typing and using typographer’s quotes directly in your JS/JSON/HTML/source:
1. No backslashes or other escape sequences needed!
2. WYSIWYG
3. Retina screens and gorgeous modern fonts mean that your sloppy quotes will look extra bad if you just use ASCII quotes
I remember staring for a long time at the file when I first saw an m4 macro. My brain was telling, surely this has go to be a typo, but then everything worked as expected. Then I learned that's a proper way of quoting there.
P.S. Major coincide I was googling this very question yesterday?