For short chunks of program text, we can probably rely on our natural language abilities to some extent. Those capabilities allow us to deal with transformational syntax, and ambiguities. So that is to say, we have a kind of general parsing algorithm that is actually way too powerful for programming language syntax, but which only works over small peepholes. Most speakers will not understand (let alone be able to produce) a correctly formed sentence that is too long or too nested. It's as if the brain has a fixed-size pattern space where a sentence has to fit; and if it fits, then a powerful pattern matching network sorts it out. Whereas a programming language parser is unfazed by a single construct spanning thousands of lines, going into hundreds of levels of nesting; it's just a matter of resources: enough stack depth and so on. As long as the grammar rules are followed, and there are resources, size makes no difference to comprehension.
When reading code, people rely on clues like indentation, and trust in adherence to conventions, particularly for larger structures. Even relatively uncomplicated constructs have to be broken into multiple lines and indented; the level of syntactic complexity that the brain can handle in a single line of code is quite tiny.
We also rely on trust in the code being mostly right: we look toward understanding or intuiting the intent of the code and then trust that it's implementing that intent, or mostly so. If something looks ambiguous, so that it has a correct interpretation matching what we think we understand to be the apparent intent, and also has one or more other interpretations, we tend to brush that aside because, "Surely the code must have been tested to be doing the right thing, right? Furthermore, if that wrong interpretation is not actually right, the program would misbehave in certain ways (I guess), and in my experience with the program, it does no such thing. And anyway, this particularly code isn't even remotely near the problem I'm looking for ..."