All to avoid having to incur a 1-3 byte per string overhead or figuring out how to efficiently work around a 255 character limit.
If you want to turn some arbitrary data into a string, just put a 0 byte where you want it to end.
Instead of having to increment and compare both a counter and a pointer (or store a final pointer for comparison) in string-manipulation operations, you just increment the pointer. It makes for very concise loops in strchr, etc.
Tokenizing a string is just a matter of throwing down 0 bytes where the tokens are (as strtok does).
Passing a tailing subset of a string to a function is as easy as adding an integer to the string pointer, rather than requiring a memcpy and length calculation.
Strtok is just as easily handled by handling strings as a 2-item struct: a size_t for length and a pointer (separating the size data from the character array). In fact, that would kick ass: you can non-destructively tokenize, pass around substring arguments, etc.
If this had been adopted 20 years ago the likely outcome would be that cpus might have fractionally more integer resources. It wouldn't affect the wallclock speed of any code.
All of the operations you describe are done just as easily with fat pointers with the additional advantage of being applicable to truly arbitrary data, not just null clean strings.
The "fat pointer" approach has been used in D for nearly 10 years now, and has proven itself to be very effective.
That's not a bug, it's a feature. Indeed, immutable strings should almost always be the default choice due to security concerns alone. C is optimized for the rare case, modern languages with immutable strings (like C#) are optimized for the common case, this is how it should be.
The things that makes macros actually nice are some of the gcc extensions, like the ({ }) block expression syntax and typeof().
// define a new type called md5_t
typedef char md5_t[33];
md5_t g_md5;
// here sizeof(g_md5) == sizeof(md5_t)
void f(md5_t md5)
{
// here sizeof(md5) == sizeof(char*)
}
The type information definitely lost inside functions. char test[1024][32];
printf("%i\n", sizeof(test));
You'll get 1024 * 32 (32,768). It didn't know that it's a 2D array. When you pass a stack-allocated array, it passes by pointer. It works this way because C is portable assembler.I love it, a way of making assembler like coding but multiplatform.
If I want high level programming I will program in another language but when you want machine control you have c without all the bloat.
However, I was amazed to find that modern assembly language (since I was last in the game 25 years ago) has many high-level concepts in it (structures, loops, conditions etc), and looks... suspiciously... C-like.
But you're quite right about portability. Although C is famously not perfectly portable (int sizes, all those #defines - just some of the issues Java tackled), it is a hell of a lot more portable than an actual assembling language. :-)
Which assembly? All the ones I know of - especially the modern ones - have become even lower level over time, as compilers started liking more regularity over more powerful instructions.
It is a limited language, both by the constraints at the time of its creation, but also by the problem space where it has been used over the years. And that's how it should be.
C is part of an ecosystem of languages, it doesn't have to be changed to acommodate the latest fads or to fix problems that nevertheless never stopped it from being widely used for decades.
If C doesn't fit a purpose, don't use it. You don't even have to stray too far, since there are a few languages that basically are just C with extras.
For (i = 1; i < 10000; i++)
{
a[i] = i * i;
b[i] = a[i] * i;
c[i] = b[i] * i;
}
Now that's not a lot of code but with array bounds checking you add 50,000 bounds checks that do nothing useful if the arrays are of the correct size. Clearly there are uses where those bounds checks are useful, but when you care about speed they can become fairly costly.You might even want to rewrite it as because it really is faster:
For (i = 1; i < 10000; i++)
{
c[i] = i * (b[i] = i * (a[i] = i));
}
PS: Ugly c code often has a vary good reason for looking the way it does.Maybe this fix to C's Biggest Mistake, a.k.a. the 'fat pointer', is just syntactic sugar.
He suggested using "fat pointers" -- pointers along with their extent. This is similar to how many Pascal compilers treat the type "String".
Kernighan mentions Pascal strings in his article but claims the solution does not scale to other types. Walter's solution does work for all array types (but admittedly has other problems).
The exponential grow of CPU processing power has led to discard such optimizations in benefit of code simplicity. But hand held devices with limited energy, computing and storage capacity may put it back into perspective.
It is a problem with our hardware.
Introducing bounds checking without introducing a penalty on array access time is impossible on our "C machines".
C/C++ are often thought of as "close to the metal" - but they are close to particular varieties of metal - those designed to run C/C++. We arrived at them through historical accident. There are many other ways to build a computer - and it is not entirely obvious that a "C architecture" is necessarily the simplest or most efficient:
That a language which is "close to the metal" is braindead is solely a consequence of braindead metal.
The "C architecture" is a universal standard, to the extent that it has become the definition of a computer to nearly everyone. This is why you will never find the phrase "C architecture" in a computer architecture textbook. And yet it is a set of specific design choices and obsolete compromises, to which there are alternatives.