When it was written, a beginning C programmer was most likely coming from a background in assembly and accessing the computer as a professional in the workplace or a student with substantial privileges. The intended audience was sitting near the cutting edge and was assumed to be sophisticated. Data validation could be left as an exercise for the reader in good conscience.
This audience is distinctly different from those who learned programming typing code from magazines and Shaw's current audience for whom he is simulating that experience.
Editorially, K&R has chosen to remain a slender tome. It has let others create fat cookbooks and "for idiots". Forty years on, Shaw criticizes the Wright Flyer by the criteria of Second World War aviation.
We don't hold K&R on a pedestal because of its pedagogical methods, but because of the power of the language it describes. The C Programming Language was a byproduct of creating a language.
Kernigan and Ritchie were programming. Their book is properly judged by different standards than Shaw's educational project.
None of which is to suggest that Shaw"s project may not achieve a comparable level of esteem
That's what he means when he says it should be relegated to 'history.'
I'd submit that K&C is the fastest way to learn C if you knew nothing about C, or the best way to acquire the Zen of C, and that for those purposes it's still second to none, and its conciseness is a virtue ... but that Zed might be right about people pointing to it as a paragon of style to newbies who hope to write it professionally someday.
In fact, I'm disappointed that Zed let this project drop. Seemed neat.
I suppose it's not very interesting to bash someone good work. I think that most important he said in "An overall critique".
So you're basically conceding the main point: K&R is obsolete and unsuitable for education. I beg to differ. It's still a very good read that stands apart from today's clumsy, overwrought introductory textbooks. It can teach you a lot of tricks that are not easily found elsewhere, and does so at a speed that allows absorbing the whole thing in much less than a semester.
What has changed is the size of the audience for C language learning materials. Until GNU/Linux, obtaining a C compiler required substantial effort for a typical computer user with AmigaDOS, MacOS, or Windows 3.x and 9600 baud bandwidth.
K&R uses examples to illustrate points. It was never intended to teach the art of computer programming. It recognizes that there are better resources for that - though I suppose someday somewhere someone will criticize Knuth for not providing a psuedo-code compiler on his website.
I disagree. I think OP was just saying that sometimes you might need more than one book, given that the original audience had pre-existing knowledge that tended to fill in the gaps. This might appear to be unsuitability "for education" within a popular marketplace of "Learn X in 24 hours" books, and make no mistake that this is the marketplace that Shaw is going after, so it's really a quibble about the level of expertise of your target market. "Rubes can't learn C from K&R." Yeah, well they never have.
Zed does acknowledge as much but this is worth pointing out. For some reason contextual intent and "intended audience" are missed by programmers (who are typically stereotyped for giving answers that are a technical depth irrelevant to the recipients).
With that in mind, I think that describing the intended audience and context (as you and others in the comments have done) is a more valuable exercise. It makes K&R (or other book) accessible and applicable to a new generation. Such posts won't normally get much intention - they lack a certain rebellious and inflammatory flair we know and love so well...
It's basically impossible to adequately warn people that K&R C is unsafe when exposed to the real world, so it's an unsafe suggestion in the way it's usually suggested.
Everyone agrees that NULL-terminated C strings are the wrong thing to do now, this is why virtually every modern language (including Go, which is obviously very C-inspired) splits a string into data and length as separate entities (even if this separation is mostly hidden from the language user).
But when you only have a handful of thousands of bytes for an entire system, you just have to accept some amount of unsafeness as a practical reality. Should K&R be amended with a warning lest it mislead people working on modern systems? Maybe. But I don't think it is fair to code review it using modern thinking about the expense of different operations and the modern luxury of vast amounts of memory.
// use heap memory as many modern systems do
char *line = malloc(MAXLINE);
char *longest = malloc(MAXLINE);
assert(line != NULL && longest != NULL && "memory error");
// initialize it but make a classic "off by one" error
for(i = 0; i < MAXLINE; i++) {
line[i] = 'a';
}
So, you create something that does not fulfill the C library invariant of what constitutes a "string", and then pass it to a copy function that assumes this invariant? It isn't a fair thing to do, and frankly, I doubt it many beginner programmers care about things like this. Yes, they may run into such a "defect" and be very miserable for a while, but that will just teach them about debugging, and most important, invariants.Zed, I appreciate your work, but if this is the direction you'll be taking with these articles, then don't bother.
From the article:
> Some folks then defend this function (despite the proof above) by claiming that the strings in the proof aren't C strings. They want to apply an artful dodge that says "the function is not defective because you aren't giving it the right inputs", but I'm saying the function is defective because most of the possible inputs cause it to crash the software.
Zed's point is not that K&R is bad because their example code doesn't match C library invariants, he's saying it's bad because it encourages people to write code that blindly assumes C library invariants will always hold.
Certainly, if you're going to write C code there's some things that really do require blind trust (for example: that your code will be compiled by a conformant C compiler), but "all strings are safely null-terminated" is incorrect so often, and the cause of so many historic security vulnerabilities and crashes, that perhaps we shouldn't be encouraging new C programmers to do it.
Because we wouldn't want a function to accidentally process data that wasn't meant for it, no?
The accurate fact is that K&R C is a book about C. It is not the end-all, but rather an introduction to the language. Sure, it has thorns. Sure, you'd be a fool to adopt the style from it; this speaks more of the culture of its readership than the book itself, however. The authors are very honest that their samples are an attempt to engage the readers attention in the Language; especially the Ingenue, new-comer, non-Professional C programmer.
To that end, the book succeeds; new C programmers get an introduction, a light read, a good set of nomenclature to understand the topic further, and so on. It is not intended, in spite of the cultural proclivity towards these things, to be "A Bible of C".
And if it were, no professional C coder worth their salty words these days would be without the New Testament, right alongside K&R on the neglected end of the bookstack, which book is of course: "Advanced C Programming - Deep C Secrets" which explains rather a lot more about the thorns of Professional C, and more, in an equally comfortable manner as both K&R, the Authors of C as well as books about C, have done.
In my opinion, Peter Van Linden has already done to K&R what Zed doesn't seem to have the humility to do; proven its value to the newcomer in becoming one step closer to a professional.
It will lead to misery.
I expect you will point us to your prodigal output in the language, and that it was only by accident that you forgotten to add any points of critique in your comment.
But this thread on the book is a pretty good resource on issues and gotchas for newbies.
I largely use Python, I've dabbled in C and always mean to learn more. K&R to me is the touchstone for that, just because so much of programming culture stems off of it. I find knowing these historical patterns helpful for understanding how programmers talk.
For someone like me, respectful critique of its style and decisions helps separate the good from the less-helpful. Whether Zed intended to start that discussion or not, it's still helpful.
Well, life as a program is not fair either.
The problem is your function can be used in many contexts, including by other people. You should not expect them to be fair, you should make your function robust.
That said, he appears to be dealing in absolutes too much. If you care about performance (and let's face it, if you're using C you do, otherwise you probably shouldn't be using C) then sometimes you can't handle as much error checking or error correcting as you'd like.
In games (where most of my experience is), it's common to have functions that are 'unsafe' by his definition, but that are hidden in compilation units and not exposed in the header, so that the programmer can control exactly where they're called from. If you have a limited number of 'gatekeeper' interface functions that are actually called from outside the module, and these either check/sanitise/assert on their inputs, then the internal private functions can safely assume that they have valid input and just run as quickly and as simply as possibly.
For instance:
- cross platform portability
- predictability
- reliability
- long term stability / maintainability
- low run-time overhead, fine grained control of memoryLong-time C coders are among the best programmers around. It's hard to understand just how damn good they are until one has gotten a chance to work with one or more of them.
The good ones know all levels of the hardware and software stack they're working with. Coupling this knowledge with the raw power of C, they can put together amazing and resource-efficient software in very little time, yet without sacrificing maintainability, security, portability and the other factors you've listed.
These are truly the people who make the impossible become possible.
- lightweight ABI, accessible from just about any other language on the planet.
- huge existing base of liberally-licensed code available for re-use
- basically the only language that is not a) legacy-stamped or b) bloated beyond learnability after more than two decades of use (or, in C's case, twice that.)
I seldom use C myself, but its strong points are undersold by the "garbage collection is to slooooow" kiddies.
It's similar to literature where people tend to ascribe the word 'classic' and yet rarely bother to read, understand or appreciate them.
Your example of unsafe functions might work for you but might not be the best example in a book which is being revered as "the classic book on C" and which influenced almost every major language and book thereafter.
http://bloggingshakespeare.com/do-we-really-like-shakespeare
I think your point about unsafe functions being fine as long as access to them is only possible through controlled access points that verify sane input is one of the most interesting points I've seen in a while.
The mental image is an access panel, behind which lay hundreds of whirling razor-sharp gears--presumably if you open it you know what you're doing and are careful.
This is what people pushing for abstraction are about, and why they're correct in some cases.
You'd be HONOURED to work with people of that calibre, but instead you feel you need to show you're better than they are by kicking at them. Trust me, you can't kick that high up.
Long-term success might be a better metric than "style". Just sayin'.
/* bad use of while loop with compound if-statement */
while ((len = getline(line, MAXLINE)) > 0)
if (len > max) {
max = len;
copy(longest, line);
}
if (max > 0) /* there was a line */
printf("%s", longest);
What do you do?My personal solution (again for JS) would be to use a new line with no braces to split up an if-statement, but to never nest a braced statement as part a pseudo-one-liner, nor to nest many levels of one-liners - as these situations could lead to confusion.
Eg for the above;
while ((len = getline(line, MAXLINE)) > 0){
if (len > max) {
max = len;
copy(longest, line);
}
}
if (max > 0)
printf("%s", longest);
This is a personal preference- i find it adequately splits up a one-line `if(max > 0) printf("%s", longest);` statement to be clearly identifiable as an if (while/for/etc) block, without the verboseness of the extra line/2 for braces, which i personally find makes code harder to read.So I'd write:
if (max > 0) printf("%s", longest);
or if (max > 0) {
printf("%s", longest);
}
but never if (max > 0)
printf("%s", longest);Fu that.
It's error prone, just like Zed says. Add braces, don't be lazy, it makes the code flow easier on the eye.
IMO, In JS, it's painfully dangerous to be "too" clever as well.
Sometimes I like to think about it like so: Will someone else "with less skill than me" be able to follow this code after me? Then again, being human I sometimes fail to think this way :(.
Complaining that this isn't easy enough is missing the point.
Arguing that the language shouldn't allow such constructs is again outside the scope of the argument. K&R designed the language, so presumably they agree with the design.
- if the line is so long that it will spill across several chunks of 'MAXLINE' length then you'll end up with the next to last bit
- you should probably use a character reader that uses realloc to resize the buffer area until even the longest strings fit or the program fails
- your getline invocation is wrong, the proper one is getline(&lineptr, &cursize, fp);
And the 'real' getline takes care of most of the above gripes, for instance all you'd have to do is to swap the pointer to 'longest' for a fresh one for the next iteration after finding a new longest entry.
I now understand why my subsequent programs, and those of many in my generation, have been riddled with bugs for 3 decades.
K&R C was best practice in 1980 or so, since then we've learned a lot about C, about what to do and what not to do. If you still program C like it is 1980 then you can't really blame that on a book from 1978.
That book was a first step for many of the devs that build many of the technologies that support everything we do online today.
A = {'a','b','\0'}; B = {'a', 'b', '\0'}; safercopy(2, A, 2, B);
A = {'a','b'}; B = {'a', 'b', '\0'}; safercopy(2, A, 2, B);
A = {'a','b','\0'}; B = {'a', 'b'}; safercopy(2, A, 2, B);
A = {'a','b'}; B = {'a', 'b'}; safercopy(2, A, 2, B);
This analysis only tries different values of A and B, not the lengths. A proper analysis of "for what values does it fail" should include all parameters. What happens if you do `safercopy(3, A, 2, B)` or `safercopy(3, A, 3, B)`?The "correctness" the author is asking for here is not what you want from a typical C functions. If you really need this kind of "correctness" then maybe you are using the wrong language and should check our either a high-level tolerant scripting language or a statically typed one.
Kernighan and Ritchie could have decided to write the book as absolute hard-asses, making the most bullet-proof copy routine imaginable, but in the end they would've been writing a different book, not a primer for a language.
It's been at least a year since the article first showed up on HN, and the author still hasn't made good on this promise. It's actually hard to do, because verbosity and clarity are usually at odds with each other.
One of the main attractions of K&R C is exactly its lack of verbosity, its extraordinarily high content-to-length ratio. In a very short space you learn to do a lot of sophisticated programming, and most if not all code examples fit on one page or less.
Of course, optimizing for conciseness has its costs, as anyone who has debugged segmentation faults knows. So you avoid some of this shooting-yourself-in-the-foot that C is infamous for by using various crutches: add an extra argument for some functions, build your own Pascal-style strings etc. And if you pass in external input then you should definitely use some of them, such as strlcpy (which is actually preferable to the four-argument function that this article is getting to).
But there are also lots of cases where plain old strcpy will do fine, and for simplicity sake it's better to use it. I believe one of these cases is a learning experience in which you want to get the big story as soon as possible, and are willing to wait until later to get acquainted with the inevitable caveats and detours.
Funny.
The same goes for "Programming Erlang" by Armstrong.
On contrary: "Erlang and OTP in Action" is quite boring (in comparison with book by Armstrong). Definitely it's not a book for "enlightment" but for practice, sometimes dull: "Do it in such way, you don't need understand why, you'll get accustomed to it in future"
The real problem with C is that it relies on bare pointers, where it would have been better to use slice-type structures that describe a buffer by pairing the base pointer and size, so that they are naturally kept in sync. This article takes a lot of time to "deconstruct" C strings, but never gets to the real issue.
The "stylistic issue" is also debatable. With the indentation given in the example, nobody would think that the "while-loop will loop both if-statements", as the author claims.
What has Zed said about C that wasn't already answered more thoroughly by Go?
if you supply a function with inputs outside of it's specification (NULL-terminated strings), then undefined behaviour is (by definition) going to happen.
besides, what's to stop someone from calling safercopy like so;
safercopy(strlen(str1), str1, strlen(str1), str2);
then strlen will fail (albeit a bit more safely - perhaps).it's a safe bet, that in production code, we'll not be working with fixed length strings. so we need to get the length of the string somehow. all his safercopy does identify a problem that he himself already points out is impossible to solve - how do we differentiate from a NULL-terminated string, and one that isn't?
the only real solution (i can think of) is a string class, where the constructor is guaranteed to return valid (or no) strings. then (assuming other functions can't overwrite our memory - already an unsafe assumption) we could guarantee a safe string copy.
programming is hard.
Jokes apart, I admire the bravery in questioning K&R C's status. I have only a few personal insights to share as I don't code much these days.
1. I got introduced to C in 80's via some popular book. K&R not only taught me C but also was my entry point into the systems programing world. Before K&R, the BASIC programming books never allowed me to deal with memory or interrupt vectors in the way C allowed me.
2. C has a great power dynamic built in to it. You are on your own in dealing with this power. It's you who crashed the machine, not C and surely not K&R C.
3. Almost every language since C has borrowed something from C. Hence, anytime I saw a familiar notation or code block in any language that reminded me of C, I got confidence that I can learn this language.
K&R's many virtues give it an unique status. It did something that no C book or website can do it. It is the word of the language designer themselves. They shared their reasons of the choices they made.
http://functional-orbitz.blogspot.se/2013/01/deconstructing-...
Criticizing K&R because of their safety assumptions is a faux pas.
Take this -
> Also assume that the safercopy() function uses a for-loop that does not test for a '\0' only, but instead uses the given lengths to determine the amount to copy.
It is possible to write a safercopy() conforming to that loose specification that will not terminate. Just make the test something like "i != length" instead of "i < length". Then you can supply negative length arguments and get it to fail. Of course, that would be stupid, but it already illustrates the art of specification. .... Well, with finite precision integers, "i != length" would terminate at some point due to wrap around, but would take the universe to end if you'd used 128bit integers. To do it even more simply, `safercopy(1,2,3,4)` can crash the program.
Is the moral of this story that programs are not valid outside the context they were created for? .. or is it to never use data structures whose integrity cannot be proved without failure? .. or is it that proving a program's correctness using some method only indicates a failure of imagination? .. or, to put it differently, that you can only prove a program wrong but never one right?