Walter Bright on C's Biggest Mistake (opens in new tab)

(dobbscodetalk.com)

64 pointskssreeram16y ago47 comments

47 comments

35 comments · 10 top-level

InclinedPlane16y ago· 6 in thread

I'd say using null-terminated strings rather than pascal style length embedded strings is C's biggest mistake. Responsible for so many inefficiencies (strlen is O(n) instead of O(1) as it should be) and, worse yet, so many incredibly serious security vulnerabilities.

All to avoid having to incur a 1-3 byte per string overhead or figuring out how to efficiently work around a 255 character limit.

nitrogen16y ago

Other than strlen, string operations can be faster with null-terminated strings. Plus, there are other benefits:

If you want to turn some arbitrary data into a string, just put a 0 byte where you want it to end.

Instead of having to increment and compare both a counter and a pointer (or store a final pointer for comparison) in string-manipulation operations, you just increment the pointer. It makes for very concise loops in strchr, etc.

Tokenizing a string is just a matter of throwing down 0 bytes where the tokens are (as strtok does).

Passing a tailing subset of a string to a function is as easy as adding an integer to the string pointer, rather than requiring a memcpy and length calculation.

philwelch16y ago

Null-terminating arbitrary data to turn it into a string is great--if your arbitrary data doesn't have any 0 bytes in it naturally.

Strtok is just as easily handled by handling strings as a 2-item struct: a size_t for length and a pointer (separating the size data from the character array). In fact, that would kick ass: you can non-destructively tokenize, pass around substring arguments, etc.

1 more reply

jasonwatkinspdx16y ago

On modern processors simple arithmetic like incrementing is effectively free. Control dependence is expensive. Data dependence is somewhere between. Fat pointers do not add any dependence. Storing the end has little cost given shadow registers.

If this had been adopted 20 years ago the likely outcome would be that cpus might have fractionally more integer resources. It wouldn't affect the wallclock speed of any code.

All of the operations you describe are done just as easily with fat pointers with the additional advantage of being applicable to truly arbitrary data, not just null clean strings.

abrahamsen16y ago

It is really an instance of the same problem. Walther Brights "fat pointer" proposal would give you length embedded strings for free.

WalterBright16y ago

Length-embedded strings are little better than the 0-terminated ones. You cannot take a substring without copying.

The "fat pointer" approach has been used in D for nearly 10 years now, and has proven itself to be very effective.

InclinedPlane16y ago

You cannot take a substring without copying.

That's not a bug, it's a feature. Indeed, immutable strings should almost always be the default choice due to security concerns alone. C is optimized for the rare case, modern languages with immutable strings (like C#) are optimized for the common case, this is how it should be.

1 more reply

coliveira16y ago· 4 in thread

I think the preprocessor is the biggest mistake. It was introduced to address issues of separate compilation in a simple way, but it generated more trouble than advantages.

__david__16y ago

I understand the issues with the pre-processor, but I still think C is better off with it than without it. I know it can be abused horrifically but it can also be abused in really nice and convenient ways. If you construct you macros right they can be useful or even wonderful. In that sense the pre-processor is very C-like.

The things that makes macros actually nice are some of the gcc extensions, like the ({ }) block expression syntax and typeof().

agazso16y ago

Macros can easily be abused, but the point of the preprocessor today is platform-dependent code generation. In higher level languages you can select with an if which branch gets executed, but at the end the platform-dependent code of that language runtime is written in C with ugly ifdef's.

vorador16y ago

Which issues has the preprocessor ?

duairc16y ago

I disagree. I've only recently started programming in C properly, having only programmed before in various dynamic scripting languages, Haskell and Java before. Macros are by far my favourite feature of C.

rbranson16y ago· 4 in thread

I don't think arrays are "converted" to pointers. Arrays are simply a cleaner way of doing pointer arithmetic and allocating large(r) blocks of the stack. Nothing is lost in this "conversion." The array never knows it's own dimensions beyond the time you declare it. It's up to the developer to keep track of that.

agazso16y ago

Here is an example that caused me a few bugs.

  // define a new type called md5_t
  typedef char md5_t[33];
  md5_t g_md5;
  // here sizeof(g_md5) == sizeof(md5_t)
  
  void f(md5_t md5)
  {
    // here sizeof(md5) == sizeof(char*)
  }

The type information definitely lost inside functions.

ori_b16y ago

Simple rule: Don't use sizeof on pointers. And don't typedef arrays to make them look like simple types.

1 more reply

rbranson16y ago

This works, sure, but if you do:

        char test[1024][32];
        printf("%i\n", sizeof(test));

You'll get 1024 * 32 (32,768). It didn't know that it's a 2D array. When you pass a stack-allocated array, it passes by pointer. It works this way because C is portable assembler.

3 more replies

ori_b16y ago

Arrays are converted to pointers on the first use. You can't ever use an array in C.

shadytrees16y ago· 3 in thread

See also the C FAQ, which patiently devotes 24 questions to the topic. (You can almost tell just how frequently the question came up on the list.)

http://c-faq.com/aryptr/index.html

weaksauce16y ago

Is there another place that buffer overflows occur than in the char* with no bounds checking? If not, this single fact is the one that leads to so many of the software vulnerabilities in the wild.

tptacek16y ago

Yes, memcpy overflows are just as common as strcpy overflows, and structure overflows are more common today than strings, if only because most of the trivial string stuff has been flushed out by now.

btilly16y ago

There are others, but that case covers most of the ones seen in practice.

xcombinator16y ago· 3 in thread

Thank god for this mistake, this mistake makes c what it is good at: at low level programming. It just pass directions between functions. Light and fast,no abstractions.

I love it, a way of making assembler like coding but multiplatform.

If I want high level programming I will program in another language but when you want machine control you have c without all the bloat.

barrkel16y ago

There would be little lost from having to specify &arr[0], rather than having array typed arr degrade into a pointer directly, but a huge amount to be gained - some very much needed help with tracking array sizes.

10ren16y ago

C combines the power and performance of assembly language with the flexibility and ease-of-use of assembly language.

However, I was amazed to find that modern assembly language (since I was last in the game 25 years ago) has many high-level concepts in it (structures, loops, conditions etc), and looks... suspiciously... C-like.

But you're quite right about portability. Although C is famously not perfectly portable (int sizes, all those #defines - just some of the issues Java tackled), it is a hell of a lot more portable than an actual assembling language. :-)

ori_b16y ago

> (structures, loops, conditions etc), and looks... suspiciously... C-like.

Which assembly? All the ones I know of - especially the modern ones - have become even lower level over time, as compilers started liking more regularity over more powerful instructions.

3 more replies

CrLf16y ago· 3 in thread

People are permanently trying to "fix" C, but C has nothing to fix.

It is a limited language, both by the constraints at the time of its creation, but also by the problem space where it has been used over the years. And that's how it should be.

C is part of an ecosystem of languages, it doesn't have to be changed to acommodate the latest fads or to fix problems that nevertheless never stopped it from being widely used for decades.

If C doesn't fit a purpose, don't use it. You don't even have to stray too far, since there are a few languages that basically are just C with extras.

gchpaco16y ago

We have been awash in buffer overflows and other, similar errors (printf strings come to mind) that are actually impossible in a safer language for years. SQL injection can happen in a safer language but you can't take over the web server by doing them. There is nothing fundamental about system languages that requires unsafe array operations. This is a flaw, and it is a flaw of C specifically and a flaw inherited by many C-descended languages. This is not some ivory tower thing that was discovered after C was designed; it was apparent even at the time (although Pascal's fix was pretty bad, variable length arrays fix it neatly). There are compiler articles from the late 70s and early 80s pointing out how even a naïve compiler could easily optimize out bounds checking in most operations!

Retric16y ago

If you design the software correctly then array bounds checking is often a waste of resources. For a stupid example let's assume you have 3 arrays of the same size and you are doing this.

  For (i = 1; i < 10000; i++)
  {
    a[i] = i * i;
    b[i] = a[i] * i;
    c[i] = b[i] * i; 
  }

Now that's not a lot of code but with array bounds checking you add 50,000 bounds checks that do nothing useful if the arrays are of the correct size. Clearly there are uses where those bounds checks are useful, but when you care about speed they can become fairly costly.

You might even want to rewrite it as because it really is faster:

  For (i = 1; i < 10000; i++)
  {
    c[i] = i * (b[i] = i * (a[i] = i)); 
  }

PS: Ugly c code often has a vary good reason for looking the way it does.

1 more reply

InclinedPlane16y ago

People are trying to create a language that fits the niche that C is supposed to fill in today's world but just barely misses the mark (fast, low-level, but still sane and more or less modern). To a lot of people it looks like the easiest way to create a language that fills that niche is to fix the bugs in C rather than create something new from scratch.

Luyt16y ago· 2 in thread

When I was reading this, I thought "Nooo! Don't make the size of an array part of its type!" That has been rightfully shown as a very bad idea by Brian Kernighan, see http://www.lysator.liu.se/c/bwk-on-pascal.html Luckily the proposal is about passing a 'fat pointer', really a pointer and a length. I did that often in my C programs too: int process(char *buf, int buflen);

Maybe this fix to C's Biggest Mistake, a.k.a. the 'fat pointer', is just syntactic sugar.

nimrody16y ago

I don't think Walter suggested having the array size part of the type (static array types).

He suggested using "fat pointers" -- pointers along with their extent. This is similar to how many Pascal compilers treat the type "String".

Kernighan mentions Pascal strings in his article but claims the solution does not scale to other types. Walter's solution does work for all array types (but admittedly has other problems).

chmike16y ago

Making the size part of the type is a useful feature in some cases. The compiler can benefit from this information to optimize the data structure and its manipulation.

The exponential grow of CPU processing power has led to discard such optimizations in benefit of code simplicity. But hand held devices with limited energy, computing and storage capacity may put it back into perspective.

asciilifeform16y ago

Lack of array bounds checking is not a problem with C.

It is a problem with our hardware.

Introducing bounds checking without introducing a penalty on array access time is impossible on our "C machines".

C/C++ are often thought of as "close to the metal" - but they are close to particular varieties of metal - those designed to run C/C++. We arrived at them through historical accident. There are many other ways to build a computer - and it is not entirely obvious that a "C architecture" is necessarily the simplest or most efficient:

http://www.loper-os.org/?p=46

That a language which is "close to the metal" is braindead is solely a consequence of braindead metal.

The "C architecture" is a universal standard, to the extent that it has become the definition of a computer to nearly everyone. This is why you will never find the phrase "C architecture" in a computer architecture textbook. And yet it is a set of specific design choices and obsolete compromises, to which there are alternatives.

kssreeramOP16y ago

I feel the lack of a module system is the biggest mistake in C. It is tiresome to prefix every single public function: list_append, list_delete, hashmap_insert etc.

1 more reply

giardini16y ago

C's biggest mistake would have to be C++.

j / k navigate · click thread line to collapse

47 comments

35 comments · 10 top-level

InclinedPlane16y ago· 6 in thread

All to avoid having to incur a 1-3 byte per string overhead or figuring out how to efficiently work around a 255 character limit.

nitrogen16y ago

Other than strlen, string operations can be faster with null-terminated strings. Plus, there are other benefits:

If you want to turn some arbitrary data into a string, just put a 0 byte where you want it to end.

Tokenizing a string is just a matter of throwing down 0 bytes where the tokens are (as strtok does).

Passing a tailing subset of a string to a function is as easy as adding an integer to the string pointer, rather than requiring a memcpy and length calculation.

philwelch16y ago

Null-terminating arbitrary data to turn it into a string is great--if your arbitrary data doesn't have any 0 bytes in it naturally.

1 more reply

jasonwatkinspdx16y ago

If this had been adopted 20 years ago the likely outcome would be that cpus might have fractionally more integer resources. It wouldn't affect the wallclock speed of any code.

All of the operations you describe are done just as easily with fat pointers with the additional advantage of being applicable to truly arbitrary data, not just null clean strings.

abrahamsen16y ago

It is really an instance of the same problem. Walther Brights "fat pointer" proposal would give you length embedded strings for free.

WalterBright16y ago

Length-embedded strings are little better than the 0-terminated ones. You cannot take a substring without copying.

The "fat pointer" approach has been used in D for nearly 10 years now, and has proven itself to be very effective.

InclinedPlane16y ago

You cannot take a substring without copying.

1 more reply

coliveira16y ago· 4 in thread

I think the preprocessor is the biggest mistake. It was introduced to address issues of separate compilation in a simple way, but it generated more trouble than advantages.

__david__16y ago

The things that makes macros actually nice are some of the gcc extensions, like the ({ }) block expression syntax and typeof().

agazso16y ago

vorador16y ago

Which issues has the preprocessor ?

duairc16y ago

rbranson16y ago· 4 in thread

agazso16y ago

Here is an example that caused me a few bugs.

  // define a new type called md5_t
  typedef char md5_t[33];
  md5_t g_md5;
  // here sizeof(g_md5) == sizeof(md5_t)
  
  void f(md5_t md5)
  {
    // here sizeof(md5) == sizeof(char*)
  }

The type information definitely lost inside functions.

ori_b16y ago

Simple rule: Don't use sizeof on pointers. And don't typedef arrays to make them look like simple types.

1 more reply

rbranson16y ago

This works, sure, but if you do:

        char test[1024][32];
        printf("%i\n", sizeof(test));

You'll get 1024 * 32 (32,768). It didn't know that it's a 2D array. When you pass a stack-allocated array, it passes by pointer. It works this way because C is portable assembler.

3 more replies

ori_b16y ago

Arrays are converted to pointers on the first use. You can't ever use an array in C.

shadytrees16y ago· 3 in thread

See also the C FAQ, which patiently devotes 24 questions to the topic. (You can almost tell just how frequently the question came up on the list.)

http://c-faq.com/aryptr/index.html

weaksauce16y ago

Is there another place that buffer overflows occur than in the char* with no bounds checking? If not, this single fact is the one that leads to so many of the software vulnerabilities in the wild.

tptacek16y ago

Yes, memcpy overflows are just as common as strcpy overflows, and structure overflows are more common today than strings, if only because most of the trivial string stuff has been flushed out by now.

btilly16y ago

There are others, but that case covers most of the ones seen in practice.

xcombinator16y ago· 3 in thread

Thank god for this mistake, this mistake makes c what it is good at: at low level programming. It just pass directions between functions. Light and fast,no abstractions.

I love it, a way of making assembler like coding but multiplatform.

If I want high level programming I will program in another language but when you want machine control you have c without all the bloat.

barrkel16y ago

10ren16y ago

C combines the power and performance of assembly language with the flexibility and ease-of-use of assembly language.

ori_b16y ago

> (structures, loops, conditions etc), and looks... suspiciously... C-like.

Which assembly? All the ones I know of - especially the modern ones - have become even lower level over time, as compilers started liking more regularity over more powerful instructions.

3 more replies

CrLf16y ago· 3 in thread

People are permanently trying to "fix" C, but C has nothing to fix.

It is a limited language, both by the constraints at the time of its creation, but also by the problem space where it has been used over the years. And that's how it should be.

C is part of an ecosystem of languages, it doesn't have to be changed to acommodate the latest fads or to fix problems that nevertheless never stopped it from being widely used for decades.

If C doesn't fit a purpose, don't use it. You don't even have to stray too far, since there are a few languages that basically are just C with extras.

gchpaco16y ago

Retric16y ago

If you design the software correctly then array bounds checking is often a waste of resources. For a stupid example let's assume you have 3 arrays of the same size and you are doing this.

  For (i = 1; i < 10000; i++)
  {
    a[i] = i * i;
    b[i] = a[i] * i;
    c[i] = b[i] * i; 
  }

You might even want to rewrite it as because it really is faster:

  For (i = 1; i < 10000; i++)
  {
    c[i] = i * (b[i] = i * (a[i] = i)); 
  }

PS: Ugly c code often has a vary good reason for looking the way it does.

1 more reply

InclinedPlane16y ago

Luyt16y ago· 2 in thread

Maybe this fix to C's Biggest Mistake, a.k.a. the 'fat pointer', is just syntactic sugar.

nimrody16y ago

I don't think Walter suggested having the array size part of the type (static array types).

He suggested using "fat pointers" -- pointers along with their extent. This is similar to how many Pascal compilers treat the type "String".

Kernighan mentions Pascal strings in his article but claims the solution does not scale to other types. Walter's solution does work for all array types (but admittedly has other problems).

chmike16y ago

Making the size part of the type is a useful feature in some cases. The compiler can benefit from this information to optimize the data structure and its manipulation.

asciilifeform16y ago

Lack of array bounds checking is not a problem with C.

It is a problem with our hardware.

Introducing bounds checking without introducing a penalty on array access time is impossible on our "C machines".

http://www.loper-os.org/?p=46

That a language which is "close to the metal" is braindead is solely a consequence of braindead metal.

kssreeramOP16y ago

I feel the lack of a module system is the biggest mistake in C. It is tiresome to prefix every single public function: list_append, list_delete, hashmap_insert etc.

1 more reply

giardini16y ago

C's biggest mistake would have to be C++.

j / k navigate · click thread line to collapse