undefined | Better HN

0 pointskazinator5y ago0 comments

That is a fallacy in language design. Humans do not have an algorithmic shortcut for parsing; if it's hard for the machine, it's hard for the human.

For short chunks of program text, we can probably rely on our natural language abilities to some extent. Those capabilities allow us to deal with transformational syntax, and ambiguities. So that is to say, we have a kind of general parsing algorithm that is actually way too powerful for programming language syntax, but which only works over small peepholes. Most speakers will not understand (let alone be able to produce) a correctly formed sentence that is too long or too nested. It's as if the brain has a fixed-size pattern space where a sentence has to fit; and if it fits, then a powerful pattern matching network sorts it out. Whereas a programming language parser is unfazed by a single construct spanning thousands of lines, going into hundreds of levels of nesting; it's just a matter of resources: enough stack depth and so on. As long as the grammar rules are followed, and there are resources, size makes no difference to comprehension.

When reading code, people rely on clues like indentation, and trust in adherence to conventions, particularly for larger structures. Even relatively uncomplicated constructs have to be broken into multiple lines and indented; the level of syntactic complexity that the brain can handle in a single line of code is quite tiny.

We also rely on trust in the code being mostly right: we look toward understanding or intuiting the intent of the code and then trust that it's implementing that intent, or mostly so. If something looks ambiguous, so that it has a correct interpretation matching what we think we understand to be the apparent intent, and also has one or more other interpretations, we tend to brush that aside because, "Surely the code must have been tested to be doing the right thing, right? Furthermore, if that wrong interpretation is not actually right, the program would misbehave in certain ways (I guess), and in my experience with the program, it does no such thing. And anyway, this particularly code isn't even remotely near the problem I'm looking for ..."

0 comments

jknoepfler5y ago

The idea that human and machine language parsing have any underlying similarity is amusing but pretty absurd. It depends upon the idea that we're somehow doing the "same essential thing", which we are not. Humans do not translate text to serial machine instructions for a processor. They do many things with text, but that is (very seldom) one of them.

I meant literally that Ruby is easier for a human to read for comprehension than say, x86 assembly, which it is. Ruby requires (however) substantially more complex parsing logic to machine parse (translate to machine instructions), because Ruby syntax tolerates an almost absurd amount of ambiguity. This distinction holds when you compare Ruby to many common programming languages. Lisp is an excellent example of a high-level language that can be parsed with minimal complexity. I can teach an undergraduate to build a Lisp parser in a day, but it would take weeks to get someone up to speed on a Ruby parser.

This was not posited as an essential tradeoff in programming languages (if I came off that way, my apologies). Ease of human readability is probably orthogonal to ease of machine parsing.

kazinatorOP5y ago

If you think that you have an algorithmic shortcut when parsing code, try cramming even a moderate amount of code into a single line with no indentation, and go by the token syntax alone. You will find yourself doing ad hoc parsing: scanning the code for matching tokens to extract what goes with what to reconstruct the tree structure.

Humans don't have a magic algorithmic shortcut. If I give you scrambled word decks of various sizes to sort manually, the best time performance you will be able to show will appear as an N log N curve. Maybe you can instantly sort seven objects just by looking at them, but not 17.

lispm5y ago

That would only be parsing Lisp s-expressions, which is a simple data syntax. But it's far from the complete syntax, which btw. is basically not statically parseable, since Lisp syntax can be on the fly reprogrammed by macros.

j / k navigate · click thread line to collapse