It was a limitation, because they chose a byte length (to save space). So strings up to 255 characters only. It was decades before folks were comfortable with 32-bit length fields. And that still limited you to 4GB strings. In the bad old days, memory usage was king.
Such a system would effectively remove that feature. Yes, you could disable range checks when indexing into a string, but you still would have to figure out how many length bytes there are. That would only be a little bit faster than a full range check.
Because of that, I don’t see how that would have been useful at the time.
In hindsight, I think the complexity is worth the safety, but I could see why it felt more elegant to use null-terminated strings at the time.
Human concepts are inherently messy. "Elegant" solutions just shove the mess down the road.
The problem is the null termination, which is not general to arrays (though it is sometimes used with arrays of pointers).
Sure 16 exabytes sounds like a lot today, but so did 4 billion ip addresses. Differently bad is not better.
Null is always 1 byte minimum so at best you save size_t-1 bytes per string. Ignoring clever structures like LEB128 varint length.
This is a classic case of "simple is actually complex". How many billions of dollars has null terminal strings cost? Hope that 3 bytes of overhead per string saved is worth it.
No matter how you slice it, null termination was a mistake.
cat_pascal_strings(pascalstr *uninited_memory,
pascalstr *left,
pascalstr *right);
how big is uninited_memory? Can left and right fit into it?You need to design language constructs around Pascal srings to make them actually safe. Such as, oh, make it impossible to have an uninitialized such object. The object has o know both its allocation size and the actual size of the string stored in it.
What is unsafe is constructing new objects in an anonymous block of memory that knows nothing about its size.
C programs run aground there not just with strings!
struct foo *ptr = malloc(sizeof ptr); // should be sizeof *ptr!!
if (ptr) {
ptr->name = name;
ptr->frobosity = fr;
Oops! The wrong size of allocated only the size of a pointer: 4 or 8 bytes, typically nowadays, but the structure is 48 bytes wide."struct foo" itself isn't inferior to a Pascal RECORD; the problem is coming from the wild and loose allocation side of things.
Working with strings in Pascal is relatively safe, but painfully limiting. It's a dead end. You can't build anything on top of it. Can you imagine trying to make a run-time for a high level language in Pascal? You need to be in the driver's seat regarding how strings work.
You mean like the strings in Delphi? Yeah, I can since I use them daily. Strings in Delphi nowadays are actually more like classes in java than Old Pascal strings. Then depending on your intend either get them to be arrays or old strings after linker goes over your code. Best of both worlds, and on top of it, if you really want, you can definitely shoot yourself in your leg with unsafe operations. So in the end is best of both worlds and worse of 3rd world. Though the 3rd one you really need to go out of your way to have it as bad as C strings are.
I doubt string representation is really the blocker here since C-strings are now pretty much just used by some but not all C programmers. QString and GString and C++ std::string and Rust strings and Go strings and Java strings and so on are not null terminated
Better yet, how about Modula-2? I can't help but think that the programming language landscape would be much better if that language occupied the niche that C does today.
This is why whenever I use sizeof, I pass a type, not a variable.
Like I get why it happened. It is just crazy how long it has stuck around.
Strings as implemented in e.g. Borland Pascal were better. But then, the length-prefixed implementation had its own downsides. For example, it had to decide how many bits to use for length. 16-bit Pascal would generally use a single byte, and in BP at least, you could even access it as a character via S[0]. Thus, strings were limited to 256 bytes max - and because this was baked into the ABI, it wasn't something that could be easily changed later.
Hence when Delphi decided to fix it, they basically had to introduce a whole new string type, leaving the old one as is. And then they added a bunch of compiler switches so that "string" could be an alias for the new type or the old, as needed in that particular code file.
> None of BCPL, B, or C supports character data strongly in the language; each treats strings much like vectors of integers and supplements general rules by a few conventions. In both BCPL and B a string literal denotes the address of a static area initialized with the characters of the string, packed into cells. In BCPL, the first packed byte contains the number of characters in the string; in B, there is no count and strings are terminated by a special character, which B spelled `*e'. This change was made partially to avoid the limitation on the length of a string caused by holding the count in an 8- or 9-bit slot, and partly because maintaining the count seemed, in our experience, less convenient than using a terminator.
[…]
> C treats strings as arrays of characters conventionally terminated by a marker. Aside from one special rule about initialization by string literals, the semantics of strings are fully subsumed by more general rules governing all arrays, and as a result the language is simpler to describe and to translate than one incorporating the string as a unique data type. Some costs accrue from its approach: certain string operations are more expensive than in other designs because application code or a library routine must occasionally search for the end of a string, because few built-in operations are available, and because the burden of storage management for strings falls more heavily on the user. Nevertheless, C's approach to strings works well.
* https://www.bell-labs.com/usr/dmr/www/chist.html
He mentions Algol 68 and Pascal [Jensen 74].
I personally don't think that the qualitative pros/cons of the chosen approach or alternatives that we're discussing today, 30-ish years later, would be all that new to the designers of C in 1993. The difference is that we've had 30-ish years to watch those decisions play out over millions of lines of code in software running at scales and levels of complexity that programmers in 1993 could only dream of.
Also, software security was barely an issue in 1993. Today, it's a massive issue.
That was him reflecting on things in 1993, but the C team designed things in ~1970. That was basically the Stone or Iron Age of computing.
(OK, it's hard to compare; Code Complete and other much later stuff might be just as good. Too many decades between when I read them to say for sure.)