An amateur linguist loses control of the language he invented (2012) (opens in new tab)

(newyorker.com)

272 pointsgodarderik11y ago106 comments

106 comments

Reading this story brings to mind the history of algorithms in the field of machine translation. Early attempts at the problem attempted to explicitly define the rules of converting between tongues using meticulously laid out systems of vocabulary and syntax. This approach proved untenable, in part due to the complex and ever changing nature of language. Modern systems such as Google Translation make use of machine learning algorithms that are fed large amounts of source material and computationally discern relationships between them.

I wonder if a similar approach could be taken with language construction. Instead of spending 25+ years fleshing out the details of a language in painstaking detail, computer programs could be devised that, using large amounts input, determine the most "efficient" means of expressing information. The approach would not only be far less labor intensive, it could also accommodate the rapidly evolving nature of language, for example adding to its "dictionary" in response to new phenomena in need of naming.

andreasvc11y ago

Interlingua was constructed this way, at least its vocabulary. They made the mistake IMHO to make the grammar naturalistic, which made it very easy to read for people who already spoke a Romance language; writing, on the other hand, was made difficult by this.

You could perhaps use a typological database with grammatical features of the world's languages and somehow select an "optimal" combination from it, but that's a far cry from letting a computer determine the most efficient means of expressing information; we have no idea how to define information/meaning, so that it's still an impossible dream. I don't think the problem is that designing languages is hard per se, it's that people can't be bothered to agree on one and learn it.

cgio11y ago

Maybe it could use Minimum Message Length http://en.wikipedia.org/wiki/Minimum_message_length

mchaver11y ago

It sounds like an experiment worth testing out and could lead to some interesting results. On the other hand, I am imagine most conlangers enjoy devising the details of their language.

erikb11y ago

In fact it doesn't just sound like an experiment worth doing. It sounds like something that somebody somewhere might have already done.

voronoff11y ago

To be glib, it has been done. We call it language.

Seriously, though, take a look at the link I posted in https://news.ycombinator.com/item?id=8180924

One of the techniques used is to computationally create a space of possible ways to partition semantic domains on a plane whose dimensions are simplicity and informativeness, in order to look at where in the possible space it is that real languages lie. While it's not been done (to my knowledge) for a whole language, it's potential direction to go.

voronoff11y ago

For anyone who is interested in what an ideal language would look like, particularly in respect to brevity vs. informativeness I'd highly suggest looking into Terry Regier's work: http://lclab.berkeley.edu/

I worked in his lab on one of many projects showing that most human languages use a near optimal trade-off in various semantic domains (so far - color, kinship, containers, and spatial relations). His work also includes some of the best evidence for some language dependent forces in cognition interacting with some universal ones.

MichaelDickens11y ago

Ithkuil seems like what a language should be: as the article said, it is both precise and concise. It looks the way Esperanto ought to have looked. I find Quijada's effort deeply impressive.

I don't know much about designing human languages, but I know how hard it is to design a decent programming language (see http://colinm.org/language_checklist.html), and building a serious human language seems orders of magnitude more difficult. I've never seen an attempt that really intrigued me until I found Ithkuil.

smsm4211y ago

Language should not be concise. Redundancy is built into the language for a reason - language communication is extremely noisy and if there's two-bit-error distance between "I love you" and "I killed and ate your dog" then the usage of this language by humans would not be comfortable.

Moreover, people communicating are imperfect. So if you have a language which is very precise and concise, you would have to spend a lot of effort to find a word or set of words which exactly expresses your meaning (in programming, we call it design when we do it upfront, and debugging when we do it post factum) and communication would be a very complex exercise. However, if you have a lot of words which mean roughly the same, you can be sure the meaning is passed through even if the words are not chosen super-carefully.

laichzeit011y ago

In some languages the meaning of a word is highly dependent on the pitch accent, like ancient Greek. If you're 1 bit off you have trouble :) Surprising enough I read an example of this yesterday evening:

"Hegelochus, the actor in Euripides' Orestes, which was presented in 408 BC, in line 279 of the play, instead of "after the storm I see again a calm sea" (galeén' horoo), Hegelochus recited "after the storm I see again a weasel" (galeên horoo)."

1 more reply

estebank11y ago

Case in point, the Turkish I problem has had tragic consequences: http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two...

1 more reply

RVuRnvbM2e11y ago

It's hard to overstate the importance of redundancy in natural languages.

This is the whole reason we are able to make ourselves understood in a noisy, imprecise world. Even if you miss a few syllables - or even half a sentence, you can usually piece together what the other person was trying to say.

Imagine someone technology-illiterate trying to describe a problem they're having with their computer in this language. Impossible.

1 more reply

rodgerd11y ago

> Ithkuil seems like what a language should be: as the article said, it is both precise and concise.

Language should be useful and expressive for its users. Human languages designed by the kind of people who prize simplicity and regularity above all other qualities tend to fail for much the same reason programming language designers are hurt and disappointed to discover C is still popular.

yepguy11y ago

Ithkuil is not simple. It's so ridiculously complex that it is just not possible for anyone to learn to speak fluently. It never stood any chance of being adopted in the way Esperanto has. I'm surprised that it found any use at all, but people without linguistics training seem to find it useful for discovering nuances in their own languages that they hadn't considered.

1 more reply

abruzzi11y ago

>Ithkuil seems like what a language should be: as the article said, it is both precise and concise.

Which is good for scientists and logicians, but would probably be an issue for novelists and poets. Exploiting ambiguity is a common feature of the arts.

andreasvc11y ago

Esperanto was designed first and foremost to be learnable, it seems to be quite successful in this regard whereas it's not a strong point of Ikthuil (as it wasn't a priority). With "ought to have looked", do you mean in your own opinion, or are you referring to some of Esperanto's original goals? I always got the impression that learnability and ease of use were bigger priorities.

Personally, I don't have much faith in the idea that you can make people's communication more precise and concise by designing the language in a certain way, just as I don't think that politically correct language leads to meaningful change. This is simply because I don't believe than language can drastically change the way people think (linguistic determinism, strong Whorfianism). Esperanto is very similar to natural languages in its (im)precision, which might just be the level where humans naturally converge to if left to their own devices.

Steko11y ago

58 different phonemes means the language is just substituting one form of complexity for another.

And apq’uxasiu for 'gawk' doesn't strike me as a particularly huge win for conciseness.

ZoFreX11y ago

> And apq’uxasiu for 'gawk' doesn't strike me as a particularly huge win for conciseness.

It seems like it's missing the linguistic equivalent of huffman coding: make frequently used things shorter.

Also, it's hardly concise with respect to time if you have to spend 30 minutes consulting a dictionary to utter a single sentence.

1 more reply

canjobear11y ago

Given that you're communicating to an agent who can use context for disambiguation, all efficient languages will be ambiguous.

http://www.sciencedirect.com/science/article/pii/S0010027711... (Sorry for the paywall...)

So maybe Ithkuil isn't what a language should be after all. And maybe that's why no natural language looks like it...

induscreep11y ago

So basically language decoding is multimodal like one of the other comments here said? Like in digital communication, perfect decoding needs appropriate SNR (context) in addition to optimal spectral efficiency (conciseness/preciseness of the language). Interesting research.

hackerboos11y ago

I remember a criticism of Esperanto was that it was too similar to Europeans languages. I read about a language that was similar to Esperanto but incorporated Chinese and Arabic qualities. Does anybody know this?

andreasvc11y ago

It's really only the vocabulary which takes from European languages, because the grammar is schematic, very regular and simple to learn. Broadly speaking, you have two options: 1) you select a specific group (e.g., European languages) and sample from their vocabulary. 2) you sample from ALL the world's languages, or generate completely random words (effectively just as hard to learn).

In the first case you have privileged one group, but in the second everyone loses ... In the second case you create an additional barrier because there is a large amount of new vocabulary that everyone has to learn, while in the other anyone who doesn't know European languages will have to learn some of its vocabulary, but that might be useful to them anyway, so it clearly seems like the better option to me.

ejr11y ago

If anyone wants to hear what Ithkuil sounds like : https://upload.wikimedia.org/wikipedia/commons/c/c9/Ithkuil_...

From https://en.wikipedia.org/wiki/Ithkuil

SideburnsOfDoom11y ago

Oh gods! It sounds like a tongue-twister played backwards. Clearly not designed for ease of pronunciation or for singing songs.

lloeki11y ago

> Ithkuil does not use the concept of zero

Interesting. How is one supposed to talk about math without a concept of zero?

hrvbr11y ago

It's likely not missing the concept of zero but rather the symbol in the numbering system. The 0 is not necessary for representing integers (if there's a new symbol for 10). I'm pretty sure it is necessary to write the decimal part of a real number.

ejr11y ago

It's a fascinating problem. It makes me wonder, without zero, what paths mathematics would have taken. There have been civilisations that used math extensively without zero and I hope Ithkuil-fluent mathematicians some day would continue exploring this.

1 more reply

tomkinstinch11y ago

The same thing happened to Blissymbols[1], as documented by radiolab[2].

1. https://en.wikipedia.org/wiki/Blissymbols

2. http://www.radiolab.org/story/257194-man-became-bliss/

MBCook11y ago

That was a great episode.

This has also happened with the language Lojban[1], which was 'forked' from Loglan[2] when the creator starting making copyright complaints so the community could maintain control.

Such an odd concept that someone could 'own' a language, but I guess if you created it I can see why you would want to.

[1] http://en.wikipedia.org/wiki/Lojban [2] http://en.wikipedia.org/wiki/Loglan

mchaver11y ago

I think it's a great lesson for any creator. Just because you have invented something does not necessarily mean you have the right or are capable of dictating how people use it.

I seem to remember Umberto Eco mentioning that he does not offer his own interpretations of his novels for a similar reason, but I can't find the quote.

1 more reply

GeneralMayhem11y ago

It's a less odd concept if you compare it to Elvish rather than English.

tokenadult11y ago

This is attracting some reader interest here, so I should probably mention, for other Hacker News participants deeply interested in human languages, a definitive analysis of Esperanto[1] explaining why Esperanto has not caught on with more speakers.

[1] http://www.xibalba.demon.co.uk/jbr/ranto/

ketralnis11y ago

I don't think that explains anything; it looks like a list of aesthetic "faults" that the author finds, unlikely to be recognised by anyone without a degree in linguistics. Surely those faults exist, but it's a stretch to pretend that they alone explain anything.

I speak some Esperanto but I'm no zealot. The "explanation" that your neighbours don't speak Esperanto, if such a simplified thing can exist, is probably as simple as network effects. I can learn German and speak to the man next-door, or I can learn Esperanto and speak to some theoretical people that may exist somewhere but I don't know them and they are mostly a bunch of nerds that meet at the local co-op to speak in Esperanto mostly about how great Esperanto is.

Go ahead. Ask your neighbour why he doesn't speak Esperanto. Does he say " The 'basic' number‐terms tri, trio, tria ('three, threesome, third') are a crowded jumble, making a mockery of the regular root/noun/adjective pattern they imitate" (K5 in the article)?

Or does he say "what's that?"?

Pamar11y ago

Agreed: studying Esperanto now makes as much sense as striving to provide CP/M compatibility in your product.

1 more reply

scythe11y ago

After reading it, I would say it is really written badly, but the author's point seems to be that most of Esperanto's structure was basically designed at random, with no particular goals in mind, other than looking superficially similar to other European languages. His substantive objections are as follows:

* the language has an excessive number of phonemes

me: On this point I disagree strongly. A language with more phonemes can form shorter words and convey information faster. that's just basic math (43^9 >> 16^9)

* the vocabulary is too large to be practical and has dubious links to other languages. specifically, Basic English requires far fewer (>10x fewer!) words for competence and is more recognizeable due to worldwide borrowing.

me: Here I agree. The amount of memorization required is quite large and borrowing words doesn't make up for it. Interlingua took a better path, but it targeted the only continent that doesn't need it.

* synthesis mechanisms are irregular and insufficiently general. in particular, esperanto's semantic structure and word synthesis fail to allow the speaker to compensate for missing vocabulary by using compound words or overly general words with attached descriptors

[I don't understand this sort of thing, but it sounds serious]

* the alphabet is needlessly complex. it uses symbols people do not recognize and cannot type in favor of ASCII digraphs with wide recognition.

me: Also, it does not accord significant importance to the way that an orange bikeshed would clearly clash with the trees in the background.

Okay, fine, I agree. This is just putting the nail in the coffin at this point.

* the noun-to-verb-to-adjective declension is not compatible with the structure of meaning, and so cannot be extended in a consistent, predictable way. that is to say, the variety of possible verbs is such that it is basically undecidable what noun should correspond to a given verb in the general case, and vice versa.

me: I don't know this but as far as I can tell he's right. For example, what is the "verb" associated with "electron"?

* the language is inherently sexist.

me: This is not a trivial criticism.

So there are some substantial criticisms in the article, methinks. And you can't discount the impact of difficulty on network effects: if one person has too much trouble learning Esperanto, they're not going to pass it on to their friends, who will not pass it on to their friends.

I wonder if a language designed to be usable with as small a core as possible -- something the author suggests -- could have a substantially better chance of catching on? Perhaps if it were also good at borrowing words from other languages ("extensible"), and grew in the right community for a little while...

1 more reply

throwaway_phd11y ago

I have a lot of respect for your opinions on language (given your background as a professional translator and the solid advice you regularly give here on HN and your website).

Whenever a post about constructed languages comes up, you post this link, which I find disappointing: Rye's rant is, well, just a rant and his reasons for not learning Esperanto are just bad.

There was a time when I spent a few months learning Esperanto. I eventually gave up because 1) whilst it is relatively easy to get going with Esperanto, speaking it comfortably would require about as much work as any other language and 2) the Esperanto community is made up out of folks that are either much older than me or a little strange.

I don't think my first criticism is a fault of Esperanto. Human languages require a lot of convention and shared cultural ideas in order for communication to be compact and, at the same time, clear. No language, auxiliary or otherwise, can magically remove these requirements.

The second criticism is perhaps a little unfair but still important. Cultural cachet is important for a language and a language that appears to cater only to certain groups is going to have a tough time.

But if someone is keen to learn a new language, especially if that person is a monoglot, they should be encouraged to give it a try, even if their language of choice is Esperanto. They can only improve their linguistic skills.

hooobert11y ago

There is also other criticism of Esperanto that's worth reading (imho). Here's a linguist and former Esperanto-proponent's opinion on the community surrounding the language.

http://www.christopherculver.com/writings/esperanto.html

tripzilch11y ago

One of the final paragraphs really resonates with my own experiences of international (mostly EU) meetings:

""" During recent travels to Spain, I had the opportunity to observe participants in a pan-European seminar on youth and globalisation. While English was the default language of this group, in conversations between any two people the participants would often switch to the native language of one or the other. For example, a young man from France would greet another in English, but upon discovering that his conversation partner is from Italy, would switch to Italian. This would not find approval among Esperantists. Ironically, English proves the neutral choice here. It is often seen as a sure bet for international communication among young people in many countries, but it is well understood that other languages may serve just as well. In the Esperanto movement, on the other hand, there is an ideological attachment to Esperanto which mandates its use even if there are other, more culturally rich possibilities. """

My experience was at large international demo-parties. I mainly noticed this for English and German, but that's because I speak those two languages relatively fluently (Dutch is my mother tongue). The same must have been happening around me for French or Spanish (or who knows what else) as well, but I don't speak those languages well enough to tell for sure whether they were using (say) Spanish as a common bridging language or were native speakers.

And yes, English is usually the sure bet. Although some of the comments elsewhere ITT have piqued my interest to perhaps learn some Spanish in the future, especially if it's really that easy to learn. Funny, I used to hate learning language (French and German) in high school. It's only later in my life that I found out I actually have quite a knack for it :) (I should probably blame the way it was taught, but I can't really get that much worked up about it, I'm pretty satisfied with the quality of my education overall)

BillChapman11y ago

Hello. You cite the case where "a young man from France would greet another in English, but upon discovering that his conversation partner is from Italy, would switch to Italian." I'm not sure how frequent such a case would be. The dominance of English has pushed other languages to the sidelines in France.

As an Esperanto speaker, I have no objection to anyone using any language on any occasion, and I'm happ. I have just returned from an Esperanto conference in Dinan, Brittany. I was able to use some basic Breton with a few individuals, but the sky did not fall on my head.

I wish you well with your language learning. You may wish to add Esperanto one day.

andreasvc11y ago

That rant is certainly not definitive, he clearly has a chip on his shoulder. He lists a lot of reasons why Esperanto is not a perfect language, but (obviously) no language can be; when a language is constructed people suddenly set way higher standards, but this is not reasonable and leads to endless bikeshedding.

Still, I'd say Esperanto is an interesting case because it is undeniably the most successful constructed language, with the longest continuous (still continuing) history. It is interesting to learn it for that reason alone, and as far as learning second languages go, it is also one of the easiest to learn. That could make it a useful first step towards learning other second languages.

Why hasn't Esperanto caught on with more speakers? Maybe it has something to do with the reasons in his rant; personally I find it much more likely that it's due to network effects. Even a "perfectly" constructed language would be very unlikely to win over the vested interests and inertia of people.

WildUtah11y ago

What I wrote about Esperanto and its failure to go viral:

"There's already an existing language that fulfills so many of those criteria that it's going to be very hard to organize a new one from scratch. And the existing language already has a deep legacy of literature and culture.

That language is the world's second most widespread: Spanish.

Spanish has been destroying the dreams of Esperantists and others over the years who hope to build a more regular, orderly, and easy to learn common language based on common Indo-European roots.

Turns out that it's dang hard to design anything easier or more accessible to speakers of any European language than Spanish already is. The spelling and pronunciation are already completely regular and predictable. The grammar is straightforward and common to almost all European tongues. The vocabulary is mostly based on Latin with some Arabic variety thrown in, but it's been standardized over the centuries so that a lot of it has a simpler and more natural morphology. The sounds are a simple subset of what most languages already use.

It's a great second language: it's fairly easy, the world's second most widespread tongue, and spoken in warm countries with very friendly natives. It's not likely to provide you with many lucrative business opportunities, though. None of the world's financial capitals use it."

scythe11y ago

Spanish is a beautiful language, I speak some myself, it's probably better than Esperanto or the more European-focused Interlingua, but... consider the number of forms of a regular verb:

(lavar (present lavo lavas lava lavamos lavan) (past lavé lavaste lavó lavámos lavarón) (imperfect lavaba lavabas lavabamos lavaban) (future lavaré lavarás lavará lavaramos lavarán) (conditional lavaría lavarías lavaríamos lavarían) (present-subjunctive lave laves lavemos laven) (gerund lavando) (participle lavado))

And that's ignoring the twenty or so irregular verbs and roughly ten "irregular" patterns (e.g. querer). It's a lot simpler than French or Italian, to be sure, or Latin, for that matter, but it could be a lot easier.

2 more replies

ilaksh11y ago

Ithkuil is definitely one of the most amazing pieces of work I have ever come across. I having been using the name as my email address for many years and another variant of it he had called 'ilaksh' as my screen name (note I didn't have anything to do with the creation of ithkuil/ilaksh, just a fan). I think not only other conlangers but also anyone interested in fields like linguistics, computer programming, knowledge representation, etc. can be inspired by what Quijada did.

I did get a few somewhat weird emails that I think were in Russian some years ago, but I think they figured out pretty quick that it wasn't the right email address to reach Quijada.

jqm11y ago

Losing control of a language seems to be standard procedure.

If this invented language were to catch on, it likely wouldn't be a generation or two and kids who grew up speaking it would start saying the Ithkuil equivalent of things like "yo dog, that's the rad shizaz!". Then, several generations thereafter grandmothers would be regularly using the word "shizaz" and they would have to put it in the dictionary. That's just the way it goes and is probably the reason we don't all speak the same language in the first place.

That being said, I've always been fascinated by the idea of a systematically created universal language and think the world would be much better place with one....if that were possible.

This was a neat article.

Terr_11y ago

I think there's some research out there that suggests all natural languages have about the same information density, when you factor how two people in conversation will add error-correction or extra context to frame an idea.

IMO this suggests the bottleneck is something about our brains on a biological rather than linguistic level.

godarderikOP11y ago

According to a study published several years ago, mainstream languages seem to operate on an information density/speed tradeoff [1].The authors found that languages that are spoken faster seem to encode less information per syllable than those uttered at a slower pace.

This does seem to suggest that biology may be the limiting role in controlling the rate at which humans convey information. Indeed, the language mentioned in the article seems almost laughably cryptic and dense. However, I feel that the limitation of the mentioned study results from the fact that it treats information on a relatively limiting per syllable basis. Quijada seems to suggest that an artificially constructed language has the ability to incorporate all the implicit meanings of a phrase that are left unsaid in normal conversation.

Ultimately, while Quijada's project seems quite unlikely to catch on among those who are not fringe pseudoscientists, it poses interesting philosophical questions about the nature of speech and communication and perhaps earns its title as a "conceptual-art project."

[1] http://rosettaproject.org/blog/02012/mar/1/language-speed-vs...

notahacker11y ago

The article seems to support the information density / speed tradeoff, in hinting several times that the language's inventor puts at least as much cognitive effort into agglutinating syllables to form a word in his language as he would into joining words to make a sentence in a second language.

hueving11y ago

>The authors found that languages that are spoken faster seem to encode more information per syllable than those uttered at a slower pace.

I think you mean the inverse.

1 more reply

micro_cam11y ago

i think it is more auditory and has to do with out ability to error correct. Actually there is a really fascinating section in james gleick's "the information" on African drum communication which is essentially a much less dense version of spoken language since you say everything using a long sentence which essentially reduces the possibility of it being misinterpreted despite most of the sounds of normal spoken language are missing. Not sure if more scholarly work supports this idea but if this is the case then a written or thought language could certainly be denser...like symbolic algebra or python.

gabemart11y ago

I found this article fascinating and satisfying.

I'm curious about the desire to reduce ambiguity, which seemed to be emphasized as a motivation for the creation of Ithkuil and some of the other languages mentioned.

Is it desirable to completely eliminate ambiguity? I can see why it would be desirable in a scientific paper or a public political debate. But in everyday interactions, (intentional) ambiguity plays many important roles.

In my experience, politeness is bolstered by some level of ambiguity. Rather than explicitly state your needs, desires or opinions, you imply them at some level of abstraction, allowing other participants in the conversation to accept or decline more easily. Imagine Jessica who has brought two friends who don't know each other to see a play. They chit-chat a little afterwards, then Jessica goes home early leaving two virtual strangers to have a drink together. It's not hard to imagine the conversation going like this:

A: "Did you enjoy the play?"

B: "It was very interesting. I thought the stage dressing was a little unconventional."

A: "Yes, I noticed that too. Very creative. I was intrigued by the style of the narration. It really let the audience write the story for themselves."

B: "It certainly didn't constrain the imagination did it? I couldn't help noticing that many of the actors took a somewhat avant-garde interpretation of the source material."

A: "Yes, as if they didn't want it to seem like they were 'acting', so to speak?"

B: It was awful wasn't it!?

A: Thank god! Yes, worst thing I've ever seen!

Ambiguity allows subtle social cues (not so subtle in my example!) that avoid direct confrontation when it might be uncomfortable. If one person loved the play and the other hated it, they each might want to avoid offending the other.

Intentional ambiguity plays an important role in other social interactions like dating or friendship-making. Correct use of ambiguity protects feelings, demonstrates subtlety and good judgement, and avoids non-productive conflict.

In artistic expression too, ambiguity is often intentional or even necessary to the effectiveness of the work. Consider a poem like "My Papa's Waltz" [1]. Does it describe happy memories of the narrator's father, or dark memories of childhood abuse [2]? Can it describe both? Is there something in between? The ambiguity isn't a byproduct of imprecise language. The ambiguity is the meaning. To resolve it is to remove the point of the work. The poem cannot be effectively communicated in any medium that does not allow for the existence of ambiguity.

[1] http://www.poetryfoundation.org/poem/172103

[2] 'Yet, this poem has an intriguing ambiguity that elicits startlingly different interpretations. Kennedy calls it a scene of "comedy" and "persistent love", and Balakian, in part, labels it a "comic romp" (62). In contrast, Ciardi sees it as a "poem of terror"' - from http://www.mrbauld.com/exrthkwtz.html

jasode11y ago

>, politeness is bolstered by some level of ambiguity. Rather than explicitly state your needs, desires or opinions, you imply them at some level of abstraction, allowing other participants in the conversation to accept or decline more easily.

Steven Pinker explores this point in an entertaining presentation[1] for RSA. He also covers other language topics such as spacetime encoding, and profanity but the last part analyzes the need for ambiguity in a language. I deep-linked into the relevant portion of the presentation although the the entire talk is very enlightening. It's worth rewinding to the beginning to watch the entire talk. The 2nd youtube video[2] is the mostly the same material but it's the older one he presented at Google TechTalks.

[1]http://www.youtube.com/watch?v=5S1d3cNge24#t=32m55s

[2]http://www.youtube.com/watch?v=hBpetDxIEMU#t=40m38s

gsnedders11y ago

To quote Levinson (2000): "inference is cheap, articulation expensive, and thus the design requirements are for a system that maximizes inference". All natural languages [citation needed] rely heavily on inference — simply because it allows one to minimize what's said. That's not to say that everything maximally relies on inference — politeness phenomena clearly demonstrate otherwise, often being "needlessly" verbose.

andreasvc11y ago

I agree, and additionally I would say we shouldn't remove ambiguity for a more basic reason: humans are very good at resolving it. There may be lots of other areas where we have systematic biases such as estimating probabilities, but dealing with ambiguity is actually one of our big strengths so it doesn't make sense to spend so much energy on avoiding it.

This is a bit speculative, but I have the feeling there is a certain kind of personality that is typically fascinated with this idea of being able to rule out ambiguity. It seems to me a tendency to overextend the idea of mathematical rigor to other areas of life.

ar-jan11y ago

The point has also been made that ambiguity is a functional property of language, because it is more efficient from an information-theoretic point of view: context also provides information, but a purely unambiguous language would have to express that information anyway. Ambiguity also allows reusing words and sounds. (http://www.sciencedirect.com/science/article/pii/S0010027711...).

igravious11y ago

Couldn't agree more. Seven Types of Ambiguity published as far back as 1930 by William Empson launched a school of criticism called New Criticism. A definition of ambiguity is then "alternative views might be taken without sheer misreading." For Empson poetry is heavily reliant on ambiguity. And, arguably, poetry is language at its most wrought with ideas most distilled.

moomin11y ago

One of the threads in David Brin's Jijo trilogy is that development in the galaxy is being held back by (amongst other things) designed, unambiguous languages.

JoeAltmaier11y ago

Ambiguity is necessary for art. If there is no room for interpretation, then its just a photo-of-words.

ginko11y ago

I also wonder if a concise language like this wouldn't make lying easier since you can manipulate the meaning of your words by so slightly altering their spelling and pronounciation.

alxndr11y ago

I think an unambiguous human-speakable language would be a great candidate to teach to a computer.

JoeAltmaier11y ago

"Among the Wakashan Indians of the Pacific Northwest, a grammatically correct sentence can’t be formed without providing what linguists refer to as “evidentiality,” inflecting the verb to indicate whether you are speaking from direct experience, inference, conjecture, or hearsay"

This is amazing. But I can't grasp the difference between inference and conjecture - they are both 'figuring out' what happened rather than knowing or hearing?

lotsofmangos11y ago

I wonder how well Ithkuil can be represented in Ian Banks' Marain script. http://trevor-hopkins.com/banks/a-few-notes-on-marain.html

gamegoblin11y ago

Ithkuil has well over double the number of phonemes as Marain, so the answer would probably be "not so well".

lotsofmangos11y ago

Rotation and reflection of the basic set extend the phonemes and can link together similar sounding ones in Marain, so I would have thought it would be achievable.

arsalanb11y ago

>"Languages are something of a mess. They evolve over centuries through an unplanned, democratic process..."

I'm in awe of the creator of Any language. Because to create a (Good) language isn't easy. This is true or both programming languages and otherwise. However, it comes without saying that adoption is a vital component of any language, and with mass adoption comes evolution.

People will often make changes in languages, make their own dialects (based on things perhaps the can relate to on a deeper level, etc..). This isn't a bad thing. To me it only signifies growth and expansion of the language.

pohl11y ago

I really enjoyed this article when it was new. Not long ago, when I was learning Octopress, my first post was Hello World in Rust and Ithkuil. (I just wanted to make sure code formatting was working.) I have no idea how correct the translation is. I just googled around until I found someone else's.

http://screaming.org/blog/2014/07/12/ettawil-cutx/

wyager11y ago

Can someone list a few popular constructed languages (maybe comparing them to programming languages)? I'd only heard of Lojban and Esperanto before reading this.

doublec11y ago

Try http://www.reddit.com/r/conlangs for discussion and lists of languages. There are many subreddits for specific languages. For example, Toki Pona [1] and Lojban [2].

[1] http://www.reddit.com/r/tokipona [2] http://www.reddit.com/r/lojban

mchaver11y ago

Toki Pona is a minimalist language with a bit of a following. It has 120 root words and tries to build all concepts based on those.

Loglan is a predecessor and inspiration to Lojban.

Slovio is the Slavic version of Esperanto.

Dothraki, Elvish (Quenya, Sindarin), Klingon, Na'vi are constructed languages from popular novels/movies.

dunham11y ago

One novel thing about Tolkien's languages: he constructed a root language (like proto-indo-european) and etymologies for them. (See http://en.wikipedia.org/wiki/The_Etymologies_(Tolkien) )

iLoch11y ago

> Slovio is the Slavic version of Esperanto.

Someone doesn't understand the point of Esperanto.

1 more reply

StavrosK11y ago

TFA:

> A sentence like “On the contrary, I think it may turn out that this rugged mountain range trails off at some point” becomes simply “Tram-mļöi hhâsmařpţuktôx.”

Wikipedia:

> Romanization: Oumpeá äx’ääļuktëx.

> Translation: "On the contrary, I think it may turn out that this rugged mountain range trails off at some point."

yongjik11y ago

That was an interesting read, but the reporter's breathless assertions frequently got in the way of appreciating Quijada and his idea.

I mean, things like:

> A sentence like “On the contrary, I think it may turn out that this rugged mountain range trails off at some point” becomes simply “Tram-mļöi hhâsmařpţuktôx.”

Simply?

We could have used LZW algorithm and the sentence could probably become even shorter, just a "simple" sequence of random-ish bytes. If you increase the number of allowed symbols, of course you need less symbols to convey the same information. If you allow for a limitless set of words that are dynamically generated from combining many roots, of course the number of words decreases... sometimes down to 1, as in polysynthetic languages. This is Information Theory 101.

moconnor11y ago

The English text is 97 ASCII encoded bytes.

Compressed with zlib: 86 bytes.

Compressed with lzma: 98 bytes.

The Ithkuil representation is just 30 UTF-8 encoded bytes.

Compressed with zlib: 39 bytes.

Compressed with lzma: 47 bytes.

(Measured using python's zlib/pylmza modules to avoid e.g. file header overhead)

It's hard to achieve this kind of compression without an external dictionary. What Quijada has created with Ithkuil is, in part, a dictionary for the space of human thought and concepts, something I wouldn't have expected to work in the way the article describes it.

Dylan1680711y ago

Actually, using zlib format gets you an unnecessary 2 byte header and 4 byte footer, so the proper sizes are 80 and 33.

I'm having trouble figuring out what's going on with lzma because the spec is lying about the header, so I won't attempt to guess the correct number there.

based211y ago

http://www.reddit.com/r/linguistics/comments/2dlsgl/utopian_...

mariusz7911y ago

While it looks like it's an impossible language to use in every day, I'm wondering if it could be used for science and technology. Just imagine having all scientific papers in it :)

alxndr11y ago

I'm amazed that the article doesn't mention Lojban at all.

thisjepisje11y ago

Off topic: Are the drop caps supposed to be lower than the line of text to which they belong? It looks kind of silly IMO.

lawlessone11y ago

Any font files for this? would be interesting to use.

stuaxo11y ago

If there was a site that summarised New Yorker articles in 2 pages I would be there in a flash.

Fastidious11y ago

That would be atrocious! You need to flavour and enjoy the reading, just the same you enjoy a nice drink, or a good cup of coffee, or you take the time to make coitus a never-ending engagement.

Just enjoy it.

StavrosK11y ago

Sometimes I need a quick snack, for the calories.

1 more reply

JohnTHaller11y ago

You can't say that on Hacker News. The downvote brigade will nail you every time for disagreeing with them. You can not comment on the verbosity of articles or unrelated extra paragraphs and asides that don't serve the overall narrative in The New Yorker, The Atlantic, etc.

QuantumGood11y ago

Seems this could contribute to accelerating artificial intelligence towards the possibility of the singularity.

_9hey11y ago

another hacker news TL;DR article

knieveltech11y ago

Too bad. It's a pretty good read.

JohnTHaller11y ago

It's a very interesting article. But it's done in old journalism/academic paper style where it takes 5 pages to get to the point and has huge multiparagraph asides that the reader is often uninterested in. I already know the history of esperanto... most people won't even care about it. I don't care at all that George Soros learned it as his first language, it's unrelated nonsense. Tell me about the topic of the article. If you want interested people to be able to learn more about Esperanto, link to a side article. We can do this today.

1 more reply

selimthegrim11y ago

Two things struck me about this article in hindsight when I read it.

-- Whose pot did the Croats, Bosnians and Slovenes piss in to not make it into this super Slavic union?

-- China Mieville wrote a book[0] along very similar thought lines which won the Locus Award.

Also, Garkavenko appears not to have taken the obvious side [1] in Ukraine's present conflict given how he is described in Foer's article

[0] https://en.wikipedia.org/wiki/Embassytown

[1] http://maidantranslations.com/2014/06/24/russian-volunteers-...

j / k navigate · click thread line to collapse

106 comments

godarderikOP11y ago

andreasvc11y ago

cgio11y ago

Maybe it could use Minimum Message Length http://en.wikipedia.org/wiki/Minimum_message_length

mchaver11y ago

It sounds like an experiment worth testing out and could lead to some interesting results. On the other hand, I am imagine most conlangers enjoy devising the details of their language.

erikb11y ago

In fact it doesn't just sound like an experiment worth doing. It sounds like something that somebody somewhere might have already done.

voronoff11y ago

To be glib, it has been done. We call it language.

Seriously, though, take a look at the link I posted in https://news.ycombinator.com/item?id=8180924

voronoff11y ago

MichaelDickens11y ago

Ithkuil seems like what a language should be: as the article said, it is both precise and concise. It looks the way Esperanto ought to have looked. I find Quijada's effort deeply impressive.

smsm4211y ago

laichzeit011y ago

1 more reply

estebank11y ago

Case in point, the Turkish I problem has had tragic consequences: http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two...

1 more reply

RVuRnvbM2e11y ago

It's hard to overstate the importance of redundancy in natural languages.

Imagine someone technology-illiterate trying to describe a problem they're having with their computer in this language. Impossible.

1 more reply

rodgerd11y ago

> Ithkuil seems like what a language should be: as the article said, it is both precise and concise.

yepguy11y ago

1 more reply

abruzzi11y ago

>Ithkuil seems like what a language should be: as the article said, it is both precise and concise.

Which is good for scientists and logicians, but would probably be an issue for novelists and poets. Exploiting ambiguity is a common feature of the arts.

andreasvc11y ago

Steko11y ago

58 different phonemes means the language is just substituting one form of complexity for another.

And apq’uxasiu for 'gawk' doesn't strike me as a particularly huge win for conciseness.

ZoFreX11y ago

> And apq’uxasiu for 'gawk' doesn't strike me as a particularly huge win for conciseness.

It seems like it's missing the linguistic equivalent of huffman coding: make frequently used things shorter.

Also, it's hardly concise with respect to time if you have to spend 30 minutes consulting a dictionary to utter a single sentence.

1 more reply

canjobear11y ago

Given that you're communicating to an agent who can use context for disambiguation, all efficient languages will be ambiguous.

http://www.sciencedirect.com/science/article/pii/S0010027711... (Sorry for the paywall...)

So maybe Ithkuil isn't what a language should be after all. And maybe that's why no natural language looks like it...

induscreep11y ago

hackerboos11y ago

andreasvc11y ago

ejr11y ago

If anyone wants to hear what Ithkuil sounds like : https://upload.wikimedia.org/wikipedia/commons/c/c9/Ithkuil_...

From https://en.wikipedia.org/wiki/Ithkuil

SideburnsOfDoom11y ago

Oh gods! It sounds like a tongue-twister played backwards. Clearly not designed for ease of pronunciation or for singing songs.

lloeki11y ago

> Ithkuil does not use the concept of zero

Interesting. How is one supposed to talk about math without a concept of zero?

hrvbr11y ago

ejr11y ago

1 more reply

tomkinstinch11y ago

The same thing happened to Blissymbols[1], as documented by radiolab[2].

1. https://en.wikipedia.org/wiki/Blissymbols

2. http://www.radiolab.org/story/257194-man-became-bliss/

MBCook11y ago

That was a great episode.

This has also happened with the language Lojban[1], which was 'forked' from Loglan[2] when the creator starting making copyright complaints so the community could maintain control.

Such an odd concept that someone could 'own' a language, but I guess if you created it I can see why you would want to.

[1] http://en.wikipedia.org/wiki/Lojban [2] http://en.wikipedia.org/wiki/Loglan

mchaver11y ago

I think it's a great lesson for any creator. Just because you have invented something does not necessarily mean you have the right or are capable of dictating how people use it.

I seem to remember Umberto Eco mentioning that he does not offer his own interpretations of his novels for a similar reason, but I can't find the quote.

1 more reply

GeneralMayhem11y ago

It's a less odd concept if you compare it to Elvish rather than English.

tokenadult11y ago

[1] http://www.xibalba.demon.co.uk/jbr/ranto/

ketralnis11y ago

Or does he say "what's that?"?

Pamar11y ago

Agreed: studying Esperanto now makes as much sense as striving to provide CP/M compatibility in your product.

1 more reply

scythe11y ago

* the language has an excessive number of phonemes

me: On this point I disagree strongly. A language with more phonemes can form shorter words and convey information faster. that's just basic math (43^9 >> 16^9)

[I don't understand this sort of thing, but it sounds serious]

* the alphabet is needlessly complex. it uses symbols people do not recognize and cannot type in favor of ASCII digraphs with wide recognition.

me: Also, it does not accord significant importance to the way that an orange bikeshed would clearly clash with the trees in the background.

Okay, fine, I agree. This is just putting the nail in the coffin at this point.

me: I don't know this but as far as I can tell he's right. For example, what is the "verb" associated with "electron"?

* the language is inherently sexist.

me: This is not a trivial criticism.

1 more reply

throwaway_phd11y ago

I have a lot of respect for your opinions on language (given your background as a professional translator and the solid advice you regularly give here on HN and your website).

Whenever a post about constructed languages comes up, you post this link, which I find disappointing: Rye's rant is, well, just a rant and his reasons for not learning Esperanto are just bad.

hooobert11y ago

There is also other criticism of Esperanto that's worth reading (imho). Here's a linguist and former Esperanto-proponent's opinion on the community surrounding the language.

http://www.christopherculver.com/writings/esperanto.html

tripzilch11y ago

One of the final paragraphs really resonates with my own experiences of international (mostly EU) meetings:

BillChapman11y ago

I wish you well with your language learning. You may wish to add Esperanto one day.

andreasvc11y ago

WildUtah11y ago

What I wrote about Esperanto and its failure to go viral:

That language is the world's second most widespread: Spanish.

Spanish has been destroying the dreams of Esperantists and others over the years who hope to build a more regular, orderly, and easy to learn common language based on common Indo-European roots.

scythe11y ago

Spanish is a beautiful language, I speak some myself, it's probably better than Esperanto or the more European-focused Interlingua, but... consider the number of forms of a regular verb:

2 more replies

ilaksh11y ago

I did get a few somewhat weird emails that I think were in Russian some years ago, but I think they figured out pretty quick that it wasn't the right email address to reach Quijada.

jqm11y ago

Losing control of a language seems to be standard procedure.

That being said, I've always been fascinated by the idea of a systematically created universal language and think the world would be much better place with one....if that were possible.

This was a neat article.

Terr_11y ago

IMO this suggests the bottleneck is something about our brains on a biological rather than linguistic level.

godarderikOP11y ago

[1] http://rosettaproject.org/blog/02012/mar/1/language-speed-vs...

notahacker11y ago

hueving11y ago

>The authors found that languages that are spoken faster seem to encode more information per syllable than those uttered at a slower pace.

I think you mean the inverse.

1 more reply

micro_cam11y ago

gabemart11y ago

I found this article fascinating and satisfying.

I'm curious about the desire to reduce ambiguity, which seemed to be emphasized as a motivation for the creation of Ithkuil and some of the other languages mentioned.

A: "Did you enjoy the play?"

B: "It was very interesting. I thought the stage dressing was a little unconventional."

A: "Yes, I noticed that too. Very creative. I was intrigued by the style of the narration. It really let the audience write the story for themselves."

B: "It certainly didn't constrain the imagination did it? I couldn't help noticing that many of the actors took a somewhat avant-garde interpretation of the source material."

A: "Yes, as if they didn't want it to seem like they were 'acting', so to speak?"

B: It was awful wasn't it!?

A: Thank god! Yes, worst thing I've ever seen!

[1] http://www.poetryfoundation.org/poem/172103

jasode11y ago

[1]http://www.youtube.com/watch?v=5S1d3cNge24#t=32m55s

[2]http://www.youtube.com/watch?v=hBpetDxIEMU#t=40m38s

gsnedders11y ago

andreasvc11y ago

ar-jan11y ago

igravious11y ago

moomin11y ago

One of the threads in David Brin's Jijo trilogy is that development in the galaxy is being held back by (amongst other things) designed, unambiguous languages.

JoeAltmaier11y ago

Ambiguity is necessary for art. If there is no room for interpretation, then its just a photo-of-words.

ginko11y ago

I also wonder if a concise language like this wouldn't make lying easier since you can manipulate the meaning of your words by so slightly altering their spelling and pronounciation.

alxndr11y ago

I think an unambiguous human-speakable language would be a great candidate to teach to a computer.

JoeAltmaier11y ago

This is amazing. But I can't grasp the difference between inference and conjecture - they are both 'figuring out' what happened rather than knowing or hearing?

lotsofmangos11y ago

I wonder how well Ithkuil can be represented in Ian Banks' Marain script. http://trevor-hopkins.com/banks/a-few-notes-on-marain.html

gamegoblin11y ago

Ithkuil has well over double the number of phonemes as Marain, so the answer would probably be "not so well".

lotsofmangos11y ago

Rotation and reflection of the basic set extend the phonemes and can link together similar sounding ones in Marain, so I would have thought it would be achievable.

arsalanb11y ago

>"Languages are something of a mess. They evolve over centuries through an unplanned, democratic process..."

pohl11y ago

http://screaming.org/blog/2014/07/12/ettawil-cutx/

wyager11y ago

Can someone list a few popular constructed languages (maybe comparing them to programming languages)? I'd only heard of Lojban and Esperanto before reading this.

doublec11y ago

Try http://www.reddit.com/r/conlangs for discussion and lists of languages. There are many subreddits for specific languages. For example, Toki Pona [1] and Lojban [2].

[1] http://www.reddit.com/r/tokipona [2] http://www.reddit.com/r/lojban

mchaver11y ago

Toki Pona is a minimalist language with a bit of a following. It has 120 root words and tries to build all concepts based on those.

Loglan is a predecessor and inspiration to Lojban.

Slovio is the Slavic version of Esperanto.

Dothraki, Elvish (Quenya, Sindarin), Klingon, Na'vi are constructed languages from popular novels/movies.

dunham11y ago

One novel thing about Tolkien's languages: he constructed a root language (like proto-indo-european) and etymologies for them. (See http://en.wikipedia.org/wiki/The_Etymologies_(Tolkien) )

iLoch11y ago

> Slovio is the Slavic version of Esperanto.

Someone doesn't understand the point of Esperanto.

1 more reply

StavrosK11y ago

TFA:

> A sentence like “On the contrary, I think it may turn out that this rugged mountain range trails off at some point” becomes simply “Tram-mļöi hhâsmařpţuktôx.”

Wikipedia:

> Romanization: Oumpeá äx’ääļuktëx.

> Translation: "On the contrary, I think it may turn out that this rugged mountain range trails off at some point."

yongjik11y ago

That was an interesting read, but the reporter's breathless assertions frequently got in the way of appreciating Quijada and his idea.

I mean, things like:

> A sentence like “On the contrary, I think it may turn out that this rugged mountain range trails off at some point” becomes simply “Tram-mļöi hhâsmařpţuktôx.”

Simply?

moconnor11y ago

The English text is 97 ASCII encoded bytes.

Compressed with zlib: 86 bytes.

Compressed with lzma: 98 bytes.

The Ithkuil representation is just 30 UTF-8 encoded bytes.

Compressed with zlib: 39 bytes.

Compressed with lzma: 47 bytes.

(Measured using python's zlib/pylmza modules to avoid e.g. file header overhead)

Dylan1680711y ago

Actually, using zlib format gets you an unnecessary 2 byte header and 4 byte footer, so the proper sizes are 80 and 33.

I'm having trouble figuring out what's going on with lzma because the spec is lying about the header, so I won't attempt to guess the correct number there.

based211y ago

http://www.reddit.com/r/linguistics/comments/2dlsgl/utopian_...

mariusz7911y ago

While it looks like it's an impossible language to use in every day, I'm wondering if it could be used for science and technology. Just imagine having all scientific papers in it :)

alxndr11y ago

I'm amazed that the article doesn't mention Lojban at all.

thisjepisje11y ago

Off topic: Are the drop caps supposed to be lower than the line of text to which they belong? It looks kind of silly IMO.

lawlessone11y ago

Any font files for this? would be interesting to use.

stuaxo11y ago

If there was a site that summarised New Yorker articles in 2 pages I would be there in a flash.

Fastidious11y ago

That would be atrocious! You need to flavour and enjoy the reading, just the same you enjoy a nice drink, or a good cup of coffee, or you take the time to make coitus a never-ending engagement.

Just enjoy it.

StavrosK11y ago

Sometimes I need a quick snack, for the calories.

1 more reply

JohnTHaller11y ago

QuantumGood11y ago

Seems this could contribute to accelerating artificial intelligence towards the possibility of the singularity.

_9hey11y ago

another hacker news TL;DR article

knieveltech11y ago

Too bad. It's a pretty good read.

JohnTHaller11y ago

1 more reply

selimthegrim11y ago

Two things struck me about this article in hindsight when I read it.

-- Whose pot did the Croats, Bosnians and Slovenes piss in to not make it into this super Slavic union?

-- China Mieville wrote a book[0] along very similar thought lines which won the Locus Award.

Also, Garkavenko appears not to have taken the obvious side [1] in Ukraine's present conflict given how he is described in Foer's article

[0] https://en.wikipedia.org/wiki/Embassytown

[1] http://maidantranslations.com/2014/06/24/russian-volunteers-...

j / k navigate · click thread line to collapse