"The C specification says that when there is such an ambiguity, munch as much as possible. (The "greedy lexer rule".)"
So j+++++k turns into:
j++ ++ + k
Which is clarified on the next slide.
I would have guessed that j++ ++ was not legal syntax.
So, I was wrong: There are two ways to parse that mess. So, there is ambiguity. And the way they resolve the ambiguity is their 'greedy' rule! Wow!
Net, that tricky stuff is too tricky for me.
There was a famous investor in Boston who said that he only invests in companies only an idiot could run well because the chances were too high that too soon some idiot would be running the company.
Well, I want code, or at least language syntax, that any idiot can understand, for now, me, and later some of the people that might be working for me!
You are way ahead of me on C, and you leave me more afraid of it than I was. But then I was always afraid of it and, in particular, never wrote ++.
So, my first issue was the statement for C
i = j+++++k;
So, to make some tests, I dusted off my ObjectRexx
script for doing C compiles, links, and execution.Platform: Windows XP SP3 with recent updates. And apparently somehow I have
Visual Studio 2008 x86 32 bit
installed, and it has relevant "tools", e.g., a
C/C++ compiler, linker, etc.I don't use IDEs or Visual Studio and, instead, apparently as a significant fraction of readers at HN, write code with my favorite text editor (e.g., KEdit) and some command line scripts (using ObjectRexx, which is elegant but for better access to Windows services, etc. likely I should convert to Microsoft's PowerShell).
So, I typed in some C code and tried to compile it. Then I encountered again one of the usually unmentioned problems in computing: Software installation and system management. Several hours later I had a C/C++ 'compile, load, and go' (CLG) script working, but my throat was sore from screaming curses at the perversity of 'system management' -- a project of a few minutes with a prerequisite of several hours of system management mud wrestling.
For the mud wrestling, the first problem was, since my last use of C, I had changed my usual boot partition from D to E. Next the version of C installed on E was different from that on D. And the installation on D would not run when E was booted. Bummer.
Next, the C compiler, linker, etc. want a lot of environment variables. Fine with me; generally I like the old PC/DOS idea of environment variables.
However, apparently Microsoft was never very clear on just what software, when, could change the environment variables where. At least I wasn't clear.
So, booting from my partition E, the C/C++ tools want environment variables set as in
E:\Program Files\Microsoft Visual Studio 9.0\Common7\Tools\vsvars32.bat
Okay. Nice little BAT file.If run the BAT file from a console window, it changes the environment variables as needed by C/C++. But, in console windows I run a little 'shell script' I wrote in ObjectRexx. I has a few nice features for directory tree walking, etc. But when run the BAT file from the command line of a console window that is running my little shell script, after the BAT file is done and returns, the environment variables have been restored to what they were before running the BAT file. If use a statement, say,
set >t1
at the end of the BAT file, then file t1 shows that
the environment variable values have been changed
while the BAT file was still running.So, sure, there is a 'stack' of invocations of processes, applications, or whatever in the console window and its address space, and, somehow, since my shell script was in the stack, when the BAT file quit the stack and its collection of environment variables was popped back to what they had been.
But eventually I relented, gave up on this little project taking just a few minutes, slowed down, thought a little, read some old notes, discovered that I should change the environment variables within my ObjectRexx script, using an ObjectRexx function for that purpose, as needed by C/C++ CLG, found the needed changes, implemented them, and, presto, got a C/C++ CLG script that works while my shell script is running and while I am booted from my drive E.
On to the C question:
For 'types', the test program has
int i, j, k;
For i = j+++++k;
my guess was that this would parse only one way, i = (j++) + (++k)
and be legal. And as I recall, but likely no longer
have good notes, some years ago on OS/2, PC/DOS, or
an IBM mainframe, i = j+++++k;
was legal.Not now! With the C/C++ tools with
Visual Studio 2008 x86 32 bit
statement i = j+++++k;
gives C/C++ compiler error message error C2105: '++' needs l-value
So, that's an L-value or 'left value' or something
that the 'operator' ++ can increment.So, it wasn't clear how the compiler was parsing. So, I tried
i = j++ ++ +k;
and it also resulted in error C2105: '++' needs l-value
So, likely the ++ that is causing the problem is the
second one.So, I tried
i = (j++)++ + k;
and still got error C2105: '++' needs l-value
Then I tried i = j++ + ++k;
and it worked as would hope: k was incremented by 1
and added to j, the sum was assigned to i, and then
j was incremented by 1.Then I tried
i = j+++k;
Surprise! It's legal! j and k are added and the
sum is assigned to i, and then j is incremented by
1.So, I long concluded that to understand some of the tricky, sparse syntax of the language, not clearly explained in K&R, have to write and run test cases as here. Bummer. But, as below, here I'm significantly wrong.
Possible to make sense out of this?
Maybe: If start reading
Brian W. Kernighan and Dennis M. Ritchie, 'The C Programming Language, Second Edition', ISBN 0-13-110362-8, Prentice-Hall, Englewood Cliffs, New Jersey, 1988.
in "Appendix A: Reference Manual" on page 191, then hear about 'tokens' and 'white space' to separate tokens.
Okay, no doubt + and ++ are such 'tokens'.
Continuing, right away on page 192 have
"If the input stream has been separated into tokens up to a given input character, the next token is the longest string of characters that could constitute a token."
I would have said "up to and including a given input character", but K&R are 'sparse'!
So, with this parsing rule, in
j+++k
the tokens are j
++
+
k
which is essentially (j++) + k
which is legal, but in j+++++k
the tokens are j
++
++
+
k
which would be essentially (j++)++ + k
where the second ++ does not have an 'L-value' to act
on.So, my remark that
j+++++k
can parse only one legal way is irrelevant because
that is not how the C parsing works.Basically I was assuming a 'token getting' parsing rule like I've implement a few times in my own work: There are tokens and delimiters, and a 'token' is the longest string of characters bounded by delimiters but not containing a delimiter. The delimiters are white space, (), etc.
K&R seems to have a point: My parsing rule would have trouble with just
j>=k
and, instead would require writing j >= k
which I do anyway.Generally, though, the C syntax is sparse and tricky, so tricky it stands to be error prone.
Back to writing Visual Basic .NET.