Linux eliminates the strncpy API after six years of work, 360 patches (opens in new tab)

(phoronix.com)

242 pointssimonpure16h ago232 comments

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

232 comments

116 comments · 23 top-level

mrlonglong15h ago· 59 in thread

the zero terminated string is I think is computing's biggest mistake. Pascal style strings were much safer.

There is a middle ground that Visual Basic (and then COM) took, with the BSTR type: It’s still a pointer to a zero-terminated char array, but there is a length field immediately preceding the first pointed-to byte. This is still compatible with a C string (assuming no embedded null characters), but BSTR-typed functions can take advantage of the length value.

tremon1h ago

> This is still compatible with a C string

Strictly speaking, it's not alignment compatible from CString to BSTR unless you declare all strings to be at most 255 characters or the cpu architecture doesn't require aligned access for multi-byte words (like x86). The BSTR alignment must match the alignment of the length word, meaning you can't convert a randomly-aligned C string to BSTR by simply attaching a prefix in-place.

Also, having the length embedded in the value rather than in the pointer makes it impossible to create BSTR (sub)slices without performing a memcpy. Fat pointers do not have this restriction.

BobbyTables28h ago

Partly agree but there would have been squabbling on the data type of the size, unless it was variable length. The latter would have had other issues too.

For a while, 16bit would probably have seemed too extravagant. Now 32bit would probably seem too small.

For a “strongly typed” language, C is pretty damn loose where would have mattered.

DarkUranium8h ago

I like the D approach where arrays are just `struct { size_t length; T* ptr; }` internally --- and strings are just arrays of `immutable(char)`.

It has a big advantage over the Pascal approach in that you can do zero-copy slicing, since the length is separate from the actual data.

And `size_t` makes perfect sense for the length here. If your strings are longer than the address space (which `size_t` technically isn't, but is practically very strongly correlated to it), then you're going to have a problem regardless of the number of bits for the length anyway.

astrobe_5h ago

This only makes a difference in terms of memory size, not in terms of speed, because for decades processors and compilers have been optimized for moving bytes around.

But one would note that in order to gain memory for this particular case of slicing, one introduces 2 extra words (size and pointer) for every other cases. Like perhaps the second most common string operation, concatenation. In those other cases, the benefit is slightly negative.

I've had extensive experience with "counted strings" because I implemented a bunch of Forth interpreters which also uses this scheme. Including the common trick of using counted and zero-terminated strings, which is the worst of both worlds in the end. Forth is the kind of language that quickly show you how bad your choices are.

I eventually dropped all that and adopted ASCIIZ strings because they are generally more efficient (if you pay attention to the strlen() performance pitfalls) and having a dead simple interface with the rest of the world (OS, libraries) is more valuable.

zeroonetwothree6h ago

C is not really strongly typed.

tialaramex5h ago

"Strongly typed but weakly checked"

It turns out that the machine is much better at the sort of boring mechanical tasks where thoroughness counts and imagination doesn't and so languages which do more, and more, and more checking pay off very well. Rust's borrowck is the obvious first thought today but say WUFFS will check that you've proved certain key properties, WUFFS doesn't need to insert runtime bounds checks for example because you've proved, before the code would compile, that you don't have any bounds misses. You might have proved it by writing bounds checks yourself of course, or likely you have an inherent mathematical rationale for why your algorithm has no misses, but either way the compiler checked your work.

1 more reply

poly2it8h ago

No, there would not have been and this is most likely not the reason. size_t exists for precisely this use case. It has existed since C89.

smackeyacky10h ago

Zero terminated strings were the basis for an awful lot of useful software. Calling them the biggest mistake in computing is a bit OTT.

I haven’t programmed anything Pascal related for 30+ years but I dimly remember thinking at the time that I wished the string system wasn’t so hard to use.

asdfasgasdgasdg9h ago

That useful software would not have been less useful if the strings in it were represented as size + buf.

ComputerGuru9h ago

That argument isn’t valid. The argument would be “this string design enabled a whole lot of useful software” but that’s a different matter. (And it could very well be the case.)

zzrrt6h ago

Lead was the basis for an awful lot of useful gasoline. Doesn't mean it was the only solution or the best one.

JdeBP6h ago

A more accurate re-phrased version of the original is that they are the biggest mistake in the C language.

* https://news.ycombinator.com/item?id=48614913

* https://news.ycombinator.com/item?id=24454369

* https://news.ycombinator.com/item?id=1014533

dmazzoni9h ago

255 characters ought to be enough for everybody, right?

Conscat6h ago

Clang and GCC both let you use Pascal strings in C if you would like (with `\p`). But Pascal strings aren't that useful today because the maximum length is too short.

jxbdbd6h ago

Why would a pascal string be any shorter than a C string?

A C string is one pointer reaching all of memory, a Pascal string is two pointers reaching all of memory

bc_programming5h ago

A pascal string is a single byte with the length, followed by the data.

Some implementations use more bytes for the length data, such as Delphi which changed over to a 4 byte prefix length, though those aren't technically Pascal strings anymore. I can't find anything about a Pascal string being two pointers?

1 more reply

Conscat5h ago

A Pascal string has a leading length byte. Because that is one byte, the text can't exceed 255 characters.

layer810h ago

Almost as bad as newline-terminated lines. ;)

bsder13h ago

Zero terminated string is a special case of sentinel value termination.

And sentinel value terminations make a lot of sense when you have punch cards and fixed length records that you need to carve into pieces.

Nobody expected any decisions they were making in the 1960s and 1970s to have any bearing on computing a half-century later. They all expected to have their mistakes long papered over by smarter people at some point.

But we ALL make the mistake of underestimating inertia.

jackbucks15h ago

It was definitely an interesting way to allocate pointers. I did once have a very large project where devs didnt understand this and resolved hundreds or more off by one and memory overwrites in C due to this feature.

But at the same time, I think blaming the software was kind of a cop out. Devs were in a hurry and simply didnt respect the rules. Given todays software engineer at large. Nerfing programming languages so they cant destroy things might not be a bad idea. But AI will nerf everything.

fragmede15h ago

why is AI gonna nerf everything? sure it could be used as the easy button, but I just spent two hours this morning learning about the neuroscience of how memory works in the brain that I didn't mean to and now I want to run studies on how memory works.

Why do you assume that AI is gonna nerf everything?

AnimalMuppet13h ago

AGI might. AI? No way.

See, AI was trained on existing data - on all that existing C code out there (sure, and also on all the papers and articles saying what was wrong with that C code). Those bugs are in the training data, and often not marked as bugs. So when AI generates C code, is it going to avoid making the mistakes that human code made? No, it's going to generate the kind of code it was trained on. How could it be otherwise?

That's not going to nerf anything.

2 more replies

lelanthran5h ago

> the zero terminated string is I think is computing's biggest mistake.

No. They had trade-offs to make, and sentinel-based sequences are a needed thing, even outside of strings.

The mistake was that ISAs never looked at what HLL needed, then add the necessary instructions (I posted more about this below).

Even NULL is not a big mistake, when looked at in context of the time in which it was developed.

msla14h ago

In addition to having to pick a size for the length counter and then, later, having to differentiate between lengths in bytes, codepoints, and glyphs, you can't subdivide a Pascal string using pointer arithmetic. To pass just the end of a string into a function, you have to either copy the tail of one Pascal-style string to another with a smaller size value, or your string has to be a struct with an integer and a pointer to the actual data instead of just an integer stuck on the beginning of the string. The first is a lot of copying in some cases, the second raises the specter of structs with invalid pointers. That's not to mention the potential problems that would cause with caches.

cornholio13h ago

You can have a universal variable length field, for example 2 bytes for strings < 32768, then four bytes, 8 bytes etc. On the critical short string path, it costs just a single bit test. The glyph vs byte issues need to be dealt with in both formats.

The subdivision issue is a good perspective, but i would argue the performance impact of cloning substrings is dwarfed by the redundant full string reads to find length.

lelanthran5h ago

> You can have a universal variable length field, for example 2 bytes for strings < 32768, then four bytes, 8 bytes etc.

To hold the length of a string, I'd do something similar to unicode:

7-bits for size + 1-bit for continuation, then 15 bits for size + 1 bit for continuation, then 23-bits for size + 1 bit for continuation, etc.

Or maybe even do it exactly the same as unicode:

    0XXX XXXX -> length of string is in those 7 bits
    1XXX XXXX  XXXX XXXX -> length of string is in those 7+8 bits
    11XX XXXX  XXXX XXXX  XXXX XXXX-> length of string is in those 6+8+8 bits
    ...

> On the critical short string path, it costs just a single bit test.

A few more clock cycles compared to NULL-termination, although my alternatives above require even more clock cycles.

If the hardware had instructions for sentinel values, things would be easier (Like how DOS calls used '$' termination for strings) and safer.

Load a sentinel byte into a register and have dedicated copy and compare instructions that take each two addresses (src and dst) and copies (or compares) src/dst until the terminator is reached (with copy copying the sentinel as well).

Considering that sentinel values are needed so often, and are so useful, it's surprising that this is not in any ISA. What we have now is kludgy workarounds in the HLL for this. It's hard to blame the HLL, because some workaround has to be implemented.

2 more replies

estebank13h ago

The third option is to have a variable width length: the top most bit signals whether the next byte corresponds to the length or to the start of the string.

pjc506h ago

.. which is why you need a second type, the one dotnet calls "Span". A substring.

fragmede13h ago

compared to Von Newman versus Harvard architecture for LLMs? I think that's a far bigger mistake.

pjc506h ago

Neumann, and .. what? In what way?

fragmede5h ago

Prompt injection only works because there isn't two streams of input to give to the LLM. Von Neumann being the architecture with a single shared memory for both data and instructions. If there were a clean way for the LLM model to distinguish between system messages vs user messages, we wouldn't have that problem.

themafia15h ago

> Pascal style strings were much safer.

The limitations were brutal. Initially you could only have 255 bytes in a string. The length of a string and the size of the allocation are now separate and you may need to think about that unused memory in your design. The problem now doubles with the introduction of UTF-8. Your string size is in bytes and you need to track characters separately.

If you want to create an array of strings you either need to specify the length of all strings and accept the memory overhead or have an array of pointers to strings. If you use an array of pointers you may end up choosing to use the 'nil' value as a sentinel that means "end of list." So we're right back where we started.

Because someone decided to downvote this HN has limited the speed at which I can reply. This site is tragic and I'm fully done with it now. You can spread propaganda and poorly sourced zeitgeist and be among friends but if you try to have a genuine conversation about programming languages you are made to be unwelcome immediately. Screw this.

> No other data structure works like this.

The linked list.

> You can't mess this up in an array

C happily decomposes arrays into pointers. You can erase your length information from the type. This was an intentional decision.

> Strings are the only data structure that assume there will be a NULL at end.

Which is why almost every string API has a version that allows you to specify the maximum length. The fact that you can use a NUL doesn't mean you have to. Which is why the concept of "sentinel values" is broadly used in many types of applications you haven't considered here.

dare9449h ago

> You can spread propaganda and poorly sourced zeitgeist and be among friends but if you try to have a genuine conversation about programming languages you are made to be unwelcome immediately.

Indeed. And the ignorance of computing history in this discussion is particularly disturbing.

The context of this particular thread is "zero terminated string is ... computing's biggest mistake". This completely ignores the situation on the ground when C was developed. At the time, people were striving for a system programming language that sat above the level of assembly but was compact enough to run within the limited resources of the then emerging mini-computer systems. The PDP-11 on which C was developed was certainly not the first mini-computer, but it was among the earliest to have a regular enough instruction set and addressing model to make a general purpose, high-level system's language possible. These systems were extremely limited in memory; the PDP-11's instruction set is limited to directly addressing at most 64KiB (code and data) and many systems of the era were hardware limited to less than that. (Indeed, I regularly run an early version of Unix, including an early C compiler, on my PDP-11/05 which is maxed out at 56KiB [of actual core]). There was no way that even a brilliant engineer like Dennis Richie was going to be able to shoe-horn in "optional" types, or the mechanics of length-value strings into a compiler that has to run in such limited space, and produce code (e.g. the Unix kernel) that has to run in even less. The fact that strings and arrays are thin abstractions on top of pointers is both a brilliant compromise in design as well as a nod to then-prevalent assembly practice. It was the exactly kind of pragmatic decision that was needed to move computing along at the time. Of course the designs from this era are antiquated now. But they were not mistakes.

rswail4h ago

The C code for strcpy is:

    while (*d++ = *s++)
         ;

On a PDP-11 that is:

    L:  MOV (R1)+, (R2)+
        BNE L

pjmlp6h ago

All those limitations were sorted out in 1978 with Modula-2 and open arrays, aka spans.

What about the UNIX and C folks propaganda of C being the first systems language, or always focusing on the original Pascal used for teaching and not everything else that followed up with Mesa, Modula-2, Ada, Object Pascal and friends, none of them with said limitations.

rswail4h ago

C was specifically developed to allow Unix to be ported.

It was a systems programming language and the first well known/successful one.

There was BCPL and then B before that, which is why the language is called "C".

Pascal was considered a teaching language, along with "Algorithms + Data Structures = Programs" by Wirth etc.

The UCSD P-system was one of the first "IDEs" and used Pascal and a bytecode interpreter of the compiled code.

Modula-2 was barely available in the early 1980s.

Ada was mired in MIL-SPEC and expensive compilers etc.

People used FORTRAN for scientific programming, C for most everything else in the non-IBM mainframe world.

1 more reply

pjc506h ago

> You can erase your length information from the type. This was an intentional decision.

Well yes, but given the number of security issues the argument is that it was in retrospect the wrong decision.

AlienRobot14h ago

>The problem now doubles with the introduction of UTF-8. Your string size is in bytes and you need to track characters separately.

That isn't really a problem.

The problem with null-terminated strings is specifically what happens when you reach the end of the allocated array and there ISN'T a NULL character.

Every string function is designed to keep going until it finds the NULL character, so if a hacker gets rid of the NULL character, he can exploit pretty much any standard string manipulation function being used elsewhere in the program to manipulate whatever memory comes AFTER the string data structure.

No other data structure works like this. You can't mess this up in an array, because no function that manipulates arrays is just going to keep going until there is a null. That would be stupid because it would require users of the function to add a NULL to the end of their arrays before passing it to the function, so instead we just pass the size of the array to everything. Strings are the only data structure that assume there will be a NULL at end.

By the way, I read once that if you use UTF-32 every code point will be 4 bytes, constantly, but even then a single code point isn't necessarily a single character. Text is just complicated.

tredre313h ago

> No other data structure works like this.

In C most data structures work like this, you keep going until you find NUL (character) or NULL (pointer). E.g. Strings, array of pointers, linked lists, etc. Of course you can add length to most of those, but it isn't the canonical/traditional way of doing things.

1 more reply

lelanthran4h ago

> Every string function is designed to keep going until it finds the NULL character, so if a hacker gets rid of the NULL character,

What sort of situation are you envisioning where a hacker can remove the sentinel (in the case of nul-termination) but not modify the length bytes (in the case of fat pointers)?

imtringued2h ago

If I zero out the destination buffer of a strcpy, and the string is longer than the destination buffer I will run into a buffer overflow problem despite every byte being a zero byte. The absence or presence of the zero byte doesn't seem to be the deciding factor.

BigTTYGothGF13h ago

> Your string size is in bytes and you need to track characters separately

No worse than C strings then.

dietr1ch15h ago

I think it was NULL itself. It was a long way until we realised we don't want invalid values and could use the type system to help us use special values safely.

jkrejcha13h ago

The problem here is that null kinda is consequential of intentional design of the type system itself. In this way, I do think that null was discovered, rather than invented. Remember, C is a kinda "portable assembler" so the constructs in it are based relatively closely to how low level data structures are mapped out in memory.

This is, and continues to be, an incredibly useful feature that makes C and C structs immensely useful concepts. Part of that does need an invalid value[1]. NULL is convenient for this and although there are some very weird JavaScript-trinity-meme-style consequences for this[2], it's such a useful concept that basically all languages that have the ability to construct pointers have a null pointer[3].

The alternative world looks like everyone inventing their own invalid values. Invalid, non-null, pointers are typically MUCH worse than null pointers for debuggability and security. If you unintentionally read/write/execute memory at 0x0 (by far the most common value for NULL), most operating systems will trap this, whereas may not necessarily if 0x12345678 is your invalid value.

[1]: Stuff like IA64 had NaT bits which were effectively an extra bit for what I assume to be this sorta thing. The problem with this is that it costs an extra bit. I don't really know much about IA64, but presumably [NaT 1] + [don't care] would be your null pointers here. I think?

[2]: Really what the standard, in my opinion, should have done is probably not make use of the null pointer UB for many different functions. A lot of compilers took the UB surrounding that to make incredibly dubious "optimizations" that broke stuff with zero actual performance benefit whatsoever

[3]: Yes, even Rust. Although some (again in my opinion) unfortunate design decisions made it so that C-Rust FFI isn't zero cost because of how it treats spans/slices

imtringued2h ago

>[3]: Yes, even Rust. Although some (again in my opinion) unfortunate design decisions made it so that C-Rust FFI isn't zero cost because of how it treats spans/slices

If Rust slices already make you sad, then the thing I'm cooking up will make you cry for days.

atherton9402711h ago

Genuinely curious, how would you handle cases where a value is unset without NULL? This is a legitimate case that happens a lot in eg data modeling

pdimitar11h ago

Sum types, of course.

2 more replies

clnhlzmn11h ago

The way we do it in modern languages with things like std::optional and even that is not the best example.

1 more reply

jibal10h ago

They already said:

> use the type system to help us use special values safely

... but this is not the place to explain what a type system is or what sum types/maybe/optional/etc. are.

bellowsgulch13h ago

Compared to scripting languages with actual tagged types, C doesn't really have a type system, and that's readily apparent to anyone who has written C in the last 43 years and debugged a program written in it.

C pretends types exist with you, but once bytes hit the road, it's all real-life and segmentation faults.

AlotOfReading10h ago

C actually does have a type system and it's one of the bigger issues with the language. If it didn't, unaligned pointers and signed overflow would be totally fine.

1 more reply

DarkUranium7h ago

By that logic, no natively-compiled language has a type system.

Though I should note that in a way, even some ISAs have one, what with e.g. separate float vs integer registers.

jkercher14h ago

Meh, I think NULL is fine in C. It's an extra, valid state to represent pointers at no cost. Unlike the more hand holdy languages, it's quite rare for a pointer in C to have the ability to be NULL since, more often than not, it's pointing at something known. It's actually quite rare to see NULL checks unless it's API code or something like that. I can see this being more of a problem in a managed language where anything can be NULL at any time.

bvrmn13h ago

NULL as a concept is fine. Inability to declare something as non-null is not.

There is a huge gap between developer expectation "it's pointing at something known" and hard reality confirmed by zillions of CVE. That's the reason optionality is prevalent in modern languages and type checkers (python, typescript), nowdays even Java has sane non-nullable types.

kelnos14h ago

> to represent pointers at no cost

I wouldn't call "cause of bugs and security issues" "no cost".

> it's quite rare for a pointer in C to have the ability to be NULL

As a C programmer for more than 25 years, that is the exact opposite of my experience.

none_to_remain12h ago

Struct foo has various members, including a bar*. But a foo may or may not be associated with a bar. If there's no associated bar, the bar* pointer is NULL. Seen and done this all the time

UqWBcuFx6NV4r13h ago

This precise mindset is why the world has suffered for decades (wrt security/integrity/availability) at the hands of what can only be described as an industry led by completely unjustified male confidence. Why are there still people fighting the “it’s not that bad, guys! you’ve just got to be a good developer like ME!” fight?

1 more reply

XorNot13h ago

The problem with let's get rid of NULL is that it's a real, required state. The vast majority of computing is actually not binary: any real input generally has at least 3 possible states: not set, true and false.

In practice really 4 because "indeterminate" is a reasonable error condition you'd like to know about.

And it keeps increasing anyway: e.g. not set has subcategories: not set due to lack of user input, not set because we're loading state from the backend etc.

NULL is the first expression of that basic problem: it's definitely not enough to eliminate NULL because the first thing which happens is your non pointer default value takes it's place.

2 more replies

senfiaj13h ago· 18 in thread

I wonder, why not use a string buffer paired with its length? For example, maybe use struct that has char pointer, and 2 ints (occupied length + total buffer length). Almost like c++'s std::string. This null terminator thing really sucks, it's potentially insecure and often unperformant.

WalterBright11h ago

Wonder no longer!

https://dlang.org/spec/arrays.html#dynamic-arrays

and

https://dlang.org/spec/arrays.html#strings

and for C:

https://digitalmars.com/articles/C-biggest-mistake.html

maxlybbert9h ago

It's definitely possible. And common, at least in some projects. The only real drawback is that sloppiness will lead to multiple slightly different nonstandard string types in the same project.

bnolsen11h ago

That's called a fat pointer. Null terminated c strings is the majority of memory errors out there.

GalaxyNova13h ago

Yes I have seen it happen a few times with `strlen` being called in a loop silently causing O(N) to turn to O(N^2)

jkrejcha11h ago

Reminds me of an article[1] that described how he cut GTA Online loading times by 70% because strlen was getting called for effectively every character in a string

[1]: https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...

sweetjuly11h ago

I remember reading this blog post when it was first published, but the subsequent updates are better than I would've ever expected this to turn out. Worth checking it out again if you've seen it before :)

senfiaj12h ago

Exactly, you can't write clean concise code when working with c strings. Almost every c string manipulation requires cognitive load: "Is the buffer size enough (including null terminator), should I reallocate it?", "I need to have the offset from the last concat, to make next concats performant", "Umm, shold I put null terminator at i or i + 1?"... It really sucks, it's akin to death by thousands of cuts.

sgerenser10h ago

Joel Spolsky coined the term “Shlemiel the Painter’s Algorithm” for this type of thing back in 2001: https://www.joelonsoftware.com/2001/12/11/back-to-basics/

none_to_remain12h ago

The size overhead of that is 2*sizeof(int) while the overhead of null termination is sizeof(char). If I remember the standard right, the former is worse by at least sizeof(char), and usually more in practice. This used to matter, sometimes still does.

kgeist11h ago

I would assume the difference is mostly negligible in practice due to the allocator rounding up the allocated memory size at least by the word size anyway (for alignment and simpler bookkeeping). You can also use variable-length encoding in the header to use 1 byte for most cases, similar to how UTF-8 does it: if the most significant bit is not set, we assume a 7-bit encoding, which can represent string lengths up to 127 using 1 byte, which is probably 99% of strings.

senfiaj12h ago

Well, not saying to always use it, but if the string size is big enough, the overhead of 2 ints becomes relatively vanishing. For generic dynamically sized strings it probably has more advantages than disadvantages. But in any case, sure, if every single byte matters or some structure requires specific memory layout, then fine. I just don't think these things are the majority of use cases. Keep in mind that the cached lengths can increase performance, since you don't have to recalculate string lengths.

lelanthran4h ago

> Well, not saying to always use it, but if the string size is big enough, the overhead of 2 ints becomes relatively vanishing.

In that case, the fix is not to change C strings (breaking a lot of existing code), but to introduce a stringbuilder type.

1 more reply

ekaryotic2h ago

I am a terrible hobby c programmer that doesn't understand pointers but surely a symmetric approach doesn't have the overhead or the bug. that is to say that if the language was designed to work in single bit pairs of a string character in conjunction of a string length character assuming a fail safe design of one dummy string character then if a bug happens in the code then there's no overflow because the length can never be shorter than the character.

chiph12h ago

Pascal did/does this, but eventually someone wants a string longer than the size portion can handle. Or wants the number of characters not the number of bytes.

jerf11h ago

I wasn't a programmer in these days, so I don't know if there's some other major concern that would kill this, but I sometimes wonder about whether we could have / should have used variable-length integers. That is, something like, 0-127 byte strings get their length prefixed, 128 - 16383 get two bytes of prefix, and the probably-rare 16384 - 2097151 strings would end up with three, though proportionally by that point it's hardly anything. Or you could use the UTF-8 mechanism for packing the bytes, though that costs more and probably doesn't get anything we'd care about in the 1980s or 1990s.

It's a bit of extra code, yes. Not necessarily all that much, but some. On average it is only slightly more expensive than null termination, and considered as a proportion of the size of the strings themselves it's hardly anything. It's probably better than the strings getting hard-limited to 0-255, though, which was quite frequently a user-visible quirk.

pjmlp6h ago

And then anyone that isn't stuck in 1976 will use open arrays.

Johanx6410h ago

Dude, every sane language out there does this. Just generally with 4byte prefix. Null-terminated stuff has always been backwards compat stuff.

Pascal strings - historically and why people even remember this being an issue - were up to 255 chars in size, if not you had to use different string type.

You might still want raw pointers for all sorts of low level stuff, but you almost never want to have null-terminated strings for anything but back-compat, one of the worst things ever, even on memory constrained systems.

MBCook11h ago

A lot of them are strings coming from or going to user space right? So wouldn’t you have to do constant conversions?

Animats11h ago· 5 in thread

This is a job for Claude!

What happens if you turn a job like that over to Claude Code? A mess? Good results? Code bloat? Worth trying on existing C programs.

qarl10h ago

I ran a test where I added a "light" mode to xscreensaver: unique changes to over 270 different C programs.

It mostly did an amazing job in a short period of time.

EDIT: Of course I get downvoted for saying this. HN isn't interested in reality any more.

krupan9h ago

These stories with no details and no proof are not interesting or helpful.

qarl9h ago

My apologies. I did not want to be downvoted for promoting my own material.

https://github.com/qarl/qscreensaver

1 more reply

ninjin9h ago

> Of course I get downvoted for saying this. HN isn't interested in reality any more.

I suspect that rather many of us are simply just tired of Claude and friends getting shoehorned into any conversation about programming at this point. It is about as fun as the Rust Brigade entering any discussion about C. It adds nothing new to the discussion and it is frankly tiring since we pretty much at any time have a handful of conversations on the front page already covering "AI" topics anyway (counting four at the time of writing this).

qarl9h ago

Well - except in this conversation it's incredibly relevant. It took six years to do this work when the work is likely mostly mechanical and could have been done much more quickly and safely with an automated system.

I thought automation would be interesting to HN - given the context and the fact it was not used.

2 more replies

naturalmovement14h ago· 4 in thread

A reminder that we've had strlcpy[1] for ~ 30 years but it was never accepted into the Linux world because of typical petty open source bullshit. This is why we can't have nice things.

[1] https://man.openbsd.org/strlcpy

ericbarrett13h ago

The Linux kernel had strlcpy over 20 years ago. It was removed in favor of strscpy because the latter was judged a better interface. Here's a 2022 article: https://lwn.net/Articles/905777/

avadodin6h ago

Returning an error is better but you're using ssize_t which is a tradeoff.

The race conditions appear to be a result of the Linux kernel implementation but UNIX style syscalls introduce these races by default. It is not an inherent flaw of the API or even the implementation Linux was using.

The only useable C string API has always been memcpy anyways.

BoingBoomTschak14h ago

Actually, glibc 2.38 has it.

naturalmovement13h ago

Wow it only took them 26 years to import a 30 line C function, a third of which is comments?

I should have sent them a nice fruit basket to commemorate the occasion.

qarl11h ago· 4 in thread

Am I going to be the first person to ask this after five hours? Really?

Wouldn't this work be extremely easy to implement with an LLM coder?

qustio9h ago

I don't think the bottleneck was that it took six years to Ctrl-F strncpy and type in new code for each file.

qarl9h ago

It's a shame you're misrepresenting what is actually going on.

In another comment here I explained that I have run a test: asking Claude Code to add a substantial feature to 270 different C programs.

Despite your beliefs - it went extremely well.

qustio9h ago

Huh, are you confusing me with someone else? I don't doubt Claude Code did that, I do the same for refactors all the time.

But xscreensaver theme tweaks for personal use have a much lower standard for quality control, regression testing, side effects, etc than a kernel used by billions of devices with thousands of interconnected drivers and subsystems.

Not to mention the coordination problem to get every maintainer on board and patches approved for each specific area when working on a project of that scale, even for a relatively narrow change.

Claude Code doesn't really help with that so don't see why the expectation would be a significant speed up (and doing it all in a single patch would definitely be rejected).

1 more reply

lelanthran3h ago

> In another comment here I explained that I have run a test: asking Claude Code to add a substantial feature to 270 different C programs.

That's a different scenario, though.

Would Claude have performed adequately if it had to add a specific feature to 270 programs buried in a set of 270m program, each of which may or may not have a dependency on one or more of the others, with virtually unbounded results to test?

In terms of tokens alone, that would have been cost-prohibitive. But lets assume that you had the money to do this: it still might not even be possible.

You're confusing "I have these 270 independent programs and want to make this change to all of them" with "I have these 270m lines of code, of which only 270 needs to be changed".

larodi15h ago· 3 in thread

Wonder when is someone going to brave and fork the linux kernel and try to ffwd it with automatic programming.

fragmede15h ago

why would you start there instead of creating something from scratch ?if you can port drivers just as easily meaning you don't especially give a shit about hardware you're running on in the first place, why even deal with linux? The battle tested LRU cache system?

literalAardvark14h ago

It's much easier to use something with all the edge cases already handled as a starting point.

convolvatron14h ago

I've seen several workalike kernels in various stages of completion. at least one of them was able to run some pretty substantial applications (Postgres, nginx, that kind of thing), and that is still I guess around 250kloc. but it only really has drivers to support hypervisor devices.

unfortunately as time goes by, the linux api surface gets larger and more convoluted. so there's going to be some coverage you're just never going to get.

but in the abstract, definitely. linux is so bloated at this point that its not clear that it can ever be 'made safe'.

thiht56m ago

> In place of strncpy, Linux kernel code should use strscpy() for NUL terminated destinations, strscpy_pad() for NUl-terminated destinations with zero-padding, strtomem_pad() for non-NUL-terminated fixed-width fields, memcpy_and_pad() for bounded copies with explicit padding, or memcpy() for known-length memory copies

What a nightmare, does it have to be so convoluted?

sirwhinesalot6h ago

I have in the past made fun of the Linux kernel devs, supposedly some of the best C developers in the world, for not knowing how to make stringbuffer and stringview types, but to be fair to them we didn't have the consensus we have today on the topic.

You know who did have the right idea though? Dennis Ritchie, who proposed a fat pointer type for C all the way back in 1990. Would have made for a perfect addition to C99. Imagine how different the world might have been had the committee added that in.

We had a second chance with the release of the "C's greatest mistake" blog article from Walter Bright in 2007, essentially pushing for the same idea as Ritchie (slices/stringviews) but explained with much clearer language.

Alas, didn't make it to C11.

We're now in C23, still nothing. But we did get _Generic and VLAs! Party hard.

5 more replies

rswail5h ago

Things that have bugged me for 40 years...

* NUL terminated strings (and now, non UTF-8 encoded strings on input/output)

* Using LF or CR or CRLF as line terminators, and pipe/comma-delimited fields when there were other unambiguous ASCII characters that could have been used (eg, GS, FS, RS) that would have made the encoding/decoding of line termination an I/O thing keeping HT/VT/CR/LF/FF as literally print related codes.

4 more replies

WalterBright11h ago

"The strncpy function within the Linux kernel has been a "persistent source of bugs" for years due to counter-intuitive semantics and behavior around NUL termination along with performance issues due to redundant zero-filling of the destination."

Huh. Whenever I've been asked to review C code, I always looked for strncpy and always found a bug with it.

lambdaone12h ago

This sort of boring grind is where the real work of systems engineering is done. Big infrastructure projects like this work on making the Linux kernel more reliable while still keeping it workable throughout the process move on the scale of decades, not months.

2 more replies

twothreeone10h ago

wow, very humbling. I'm actually amazed how many people contributed to this. It's easy to get attribution for "cool new features", but arguable removing bad features is even more important for something as fundamental as the kernel. Cudos!

I'm sure these are the sorts of things that will go down as folklore from the "founding ages", when everyone will have forgotten how to understand source code in 50 years and the Claude/Codex cruft just silently keeps piling on and burning the majority of our planets energy.

3 more replies

cm21878h ago

A lot of pain and suffering to avoid having a string datatype.

2 more replies

GTP1h ago

I always thought that srncpy was the safe alternative to strcpy. Now that I think of it, I'm unsure if the NUL terminator is counted into strncpy's size or not, which would be a likely source of errors. But, could someone explain better what the problems were? And also, would have to pick the right function in the list of given alternatives much better?

2 more replies

stcg2h ago

I wonder what is the difficulty in rewriting strncpy uses that makes it take six years? Was it widespread? Or was it more of a long going effort, where it was only changed if there were some changes in the same file? Or is there some other thing that makes it difficult?

kstenerud3h ago

strncpy is 99.999% of the time NOT the correct function to call, so this is a huge win.

It's just a shame that such a confusing name was chosen for such a niche use case (fixed width records that require null padding).

rswail4h ago

In all the comments in this thread it's interesting how people confuse:

* NUL: An ASCII non-printing character with the byte value of 0

* NULL: A pointer that does not point to usable memory with the value that compiles in C to be equal to ((void *) 0).

1 more reply

PlunderBunny15h ago

I worked on a Win32 app that used space-padded strings, i.e. the destination string was padded with spaces, but there was still a null on the last byte. You had to use special versions of the string functions for length, copy etc.

I’m not sure why this was - the source base was so old it might have had its origins in Pascal struct behaviour.

3 more replies

D-Coder12h ago

Note that "360 Patches" is 360 uses of strncpy that have been removed, not necessarily bugs.

1 more reply

DerSaidin8h ago

strtomem_pad seems redundant with memcpy_and_pad, and also it requires the preprocessor: https://github.com/torvalds/linux/blob/1a3746ccbb0a97bed3c06...

I was curious: Why have it, instead of just using memcpy_and_pad?

AI's answer (paraphrased) was * Avoid possible bugs from manually write sizeof(dest) * Enforces the __nonstring Attribute * signals: "I am converting an actual C-string into a fixed-width legacy memory field." vs copy binary data & pad it.

Interesting to learn about the __nonstring attribute:

https://github.com/torvalds/linux/blob/1a3746ccbb0a97bed3c06... https://github.com/search?q=repo%3Atorvalds%2Flinux+__nonstr...

devsda11h ago

Did anybody else misunderstand the title as removing strncpy func for linux users ?

For a moment, I misunderstood it as (g)libc removing strncpy and was worried about the trouble its going to cause.

jibal10h ago

The purpose of strncpy, which was originally part of the UNIX kernel code, was to copy file names to and from directory entries that consisted of a 2 byte inode number and a 14 byte zero-padded but not zero-terminated name field.

I started warning my colleagues against using it the moment I saw it for the first time about 50 years ago.

1 more reply

pjmlp6h ago

Now lets put that work into money, to assert what was the cost impact of replacing strncpy().

j / k navigate · click thread line to collapse

232 comments

116 comments · 23 top-level

mrlonglong15h ago· 59 in thread

the zero terminated string is I think is computing's biggest mistake. Pascal style strings were much safer.

layer84h ago

tremon1h ago

> This is still compatible with a C string

Also, having the length embedded in the value rather than in the pointer makes it impossible to create BSTR (sub)slices without performing a memcpy. Fat pointers do not have this restriction.

BobbyTables28h ago

Partly agree but there would have been squabbling on the data type of the size, unless it was variable length. The latter would have had other issues too.

For a while, 16bit would probably have seemed too extravagant. Now 32bit would probably seem too small.

For a “strongly typed” language, C is pretty damn loose where would have mattered.

DarkUranium8h ago

I like the D approach where arrays are just `struct { size_t length; T* ptr; }` internally --- and strings are just arrays of `immutable(char)`.

It has a big advantage over the Pascal approach in that you can do zero-copy slicing, since the length is separate from the actual data.

astrobe_5h ago

This only makes a difference in terms of memory size, not in terms of speed, because for decades processors and compilers have been optimized for moving bytes around.

zeroonetwothree6h ago

C is not really strongly typed.

tialaramex5h ago

"Strongly typed but weakly checked"

1 more reply

poly2it8h ago

No, there would not have been and this is most likely not the reason. size_t exists for precisely this use case. It has existed since C89.

smackeyacky10h ago

Zero terminated strings were the basis for an awful lot of useful software. Calling them the biggest mistake in computing is a bit OTT.

I haven’t programmed anything Pascal related for 30+ years but I dimly remember thinking at the time that I wished the string system wasn’t so hard to use.

asdfasgasdgasdg9h ago

That useful software would not have been less useful if the strings in it were represented as size + buf.

ComputerGuru9h ago

That argument isn’t valid. The argument would be “this string design enabled a whole lot of useful software” but that’s a different matter. (And it could very well be the case.)

zzrrt6h ago

Lead was the basis for an awful lot of useful gasoline. Doesn't mean it was the only solution or the best one.

JdeBP6h ago

A more accurate re-phrased version of the original is that they are the biggest mistake in the C language.

* https://news.ycombinator.com/item?id=48614913

* https://news.ycombinator.com/item?id=24454369

* https://news.ycombinator.com/item?id=1014533

dmazzoni9h ago

255 characters ought to be enough for everybody, right?

Conscat6h ago

Clang and GCC both let you use Pascal strings in C if you would like (with `\p`). But Pascal strings aren't that useful today because the maximum length is too short.

jxbdbd6h ago

Why would a pascal string be any shorter than a C string?

A C string is one pointer reaching all of memory, a Pascal string is two pointers reaching all of memory

bc_programming5h ago

A pascal string is a single byte with the length, followed by the data.

1 more reply

Conscat5h ago

A Pascal string has a leading length byte. Because that is one byte, the text can't exceed 255 characters.

layer810h ago

Almost as bad as newline-terminated lines. ;)

bsder13h ago

Zero terminated string is a special case of sentinel value termination.

And sentinel value terminations make a lot of sense when you have punch cards and fixed length records that you need to carve into pieces.

But we ALL make the mistake of underestimating inertia.

jackbucks15h ago

fragmede15h ago

Why do you assume that AI is gonna nerf everything?

AnimalMuppet13h ago

AGI might. AI? No way.

That's not going to nerf anything.

2 more replies

lelanthran5h ago

> the zero terminated string is I think is computing's biggest mistake.

No. They had trade-offs to make, and sentinel-based sequences are a needed thing, even outside of strings.

The mistake was that ISAs never looked at what HLL needed, then add the necessary instructions (I posted more about this below).

Even NULL is not a big mistake, when looked at in context of the time in which it was developed.

msla14h ago

cornholio13h ago

The subdivision issue is a good perspective, but i would argue the performance impact of cloning substrings is dwarfed by the redundant full string reads to find length.

lelanthran5h ago

> You can have a universal variable length field, for example 2 bytes for strings < 32768, then four bytes, 8 bytes etc.

To hold the length of a string, I'd do something similar to unicode:

7-bits for size + 1-bit for continuation, then 15 bits for size + 1 bit for continuation, then 23-bits for size + 1 bit for continuation, etc.

Or maybe even do it exactly the same as unicode:

    0XXX XXXX -> length of string is in those 7 bits
    1XXX XXXX  XXXX XXXX -> length of string is in those 7+8 bits
    11XX XXXX  XXXX XXXX  XXXX XXXX-> length of string is in those 6+8+8 bits
    ...

> On the critical short string path, it costs just a single bit test.

A few more clock cycles compared to NULL-termination, although my alternatives above require even more clock cycles.

If the hardware had instructions for sentinel values, things would be easier (Like how DOS calls used '$' termination for strings) and safer.

2 more replies

estebank13h ago

The third option is to have a variable width length: the top most bit signals whether the next byte corresponds to the length or to the start of the string.

pjc506h ago

.. which is why you need a second type, the one dotnet calls "Span". A substring.

fragmede13h ago

compared to Von Newman versus Harvard architecture for LLMs? I think that's a far bigger mistake.

pjc506h ago

Neumann, and .. what? In what way?

fragmede5h ago

themafia15h ago

> Pascal style strings were much safer.

> No other data structure works like this.

The linked list.

> You can't mess this up in an array

C happily decomposes arrays into pointers. You can erase your length information from the type. This was an intentional decision.

> Strings are the only data structure that assume there will be a NULL at end.

dare9449h ago

> You can spread propaganda and poorly sourced zeitgeist and be among friends but if you try to have a genuine conversation about programming languages you are made to be unwelcome immediately.

Indeed. And the ignorance of computing history in this discussion is particularly disturbing.

rswail4h ago

The C code for strcpy is:

    while (*d++ = *s++)
         ;

On a PDP-11 that is:

    L:  MOV (R1)+, (R2)+
        BNE L

pjmlp6h ago

All those limitations were sorted out in 1978 with Modula-2 and open arrays, aka spans.

rswail4h ago

C was specifically developed to allow Unix to be ported.

It was a systems programming language and the first well known/successful one.

There was BCPL and then B before that, which is why the language is called "C".

Pascal was considered a teaching language, along with "Algorithms + Data Structures = Programs" by Wirth etc.

The UCSD P-system was one of the first "IDEs" and used Pascal and a bytecode interpreter of the compiled code.

Modula-2 was barely available in the early 1980s.

Ada was mired in MIL-SPEC and expensive compilers etc.

People used FORTRAN for scientific programming, C for most everything else in the non-IBM mainframe world.

1 more reply

pjc506h ago

> You can erase your length information from the type. This was an intentional decision.

Well yes, but given the number of security issues the argument is that it was in retrospect the wrong decision.

AlienRobot14h ago

>The problem now doubles with the introduction of UTF-8. Your string size is in bytes and you need to track characters separately.

That isn't really a problem.

The problem with null-terminated strings is specifically what happens when you reach the end of the allocated array and there ISN'T a NULL character.

By the way, I read once that if you use UTF-32 every code point will be 4 bytes, constantly, but even then a single code point isn't necessarily a single character. Text is just complicated.

tredre313h ago

> No other data structure works like this.

1 more reply

lelanthran4h ago

> Every string function is designed to keep going until it finds the NULL character, so if a hacker gets rid of the NULL character,

What sort of situation are you envisioning where a hacker can remove the sentinel (in the case of nul-termination) but not modify the length bytes (in the case of fat pointers)?

imtringued2h ago

BigTTYGothGF13h ago

> Your string size is in bytes and you need to track characters separately

No worse than C strings then.

dietr1ch15h ago

I think it was NULL itself. It was a long way until we realised we don't want invalid values and could use the type system to help us use special values safely.

jkrejcha13h ago

[3]: Yes, even Rust. Although some (again in my opinion) unfortunate design decisions made it so that C-Rust FFI isn't zero cost because of how it treats spans/slices

imtringued2h ago

>[3]: Yes, even Rust. Although some (again in my opinion) unfortunate design decisions made it so that C-Rust FFI isn't zero cost because of how it treats spans/slices

If Rust slices already make you sad, then the thing I'm cooking up will make you cry for days.

atherton9402711h ago

Genuinely curious, how would you handle cases where a value is unset without NULL? This is a legitimate case that happens a lot in eg data modeling

pdimitar11h ago

Sum types, of course.

2 more replies

clnhlzmn11h ago

The way we do it in modern languages with things like std::optional and even that is not the best example.

1 more reply

jibal10h ago

They already said:

> use the type system to help us use special values safely

... but this is not the place to explain what a type system is or what sum types/maybe/optional/etc. are.

bellowsgulch13h ago

C pretends types exist with you, but once bytes hit the road, it's all real-life and segmentation faults.

AlotOfReading10h ago

C actually does have a type system and it's one of the bigger issues with the language. If it didn't, unaligned pointers and signed overflow would be totally fine.

1 more reply

DarkUranium7h ago

By that logic, no natively-compiled language has a type system.

Though I should note that in a way, even some ISAs have one, what with e.g. separate float vs integer registers.

jkercher14h ago

bvrmn13h ago

NULL as a concept is fine. Inability to declare something as non-null is not.

kelnos14h ago

> to represent pointers at no cost

I wouldn't call "cause of bugs and security issues" "no cost".

> it's quite rare for a pointer in C to have the ability to be NULL

As a C programmer for more than 25 years, that is the exact opposite of my experience.

none_to_remain12h ago

Struct foo has various members, including a bar*. But a foo may or may not be associated with a bar. If there's no associated bar, the bar* pointer is NULL. Seen and done this all the time

UqWBcuFx6NV4r13h ago

1 more reply

XorNot13h ago

In practice really 4 because "indeterminate" is a reasonable error condition you'd like to know about.

And it keeps increasing anyway: e.g. not set has subcategories: not set due to lack of user input, not set because we're loading state from the backend etc.

NULL is the first expression of that basic problem: it's definitely not enough to eliminate NULL because the first thing which happens is your non pointer default value takes it's place.

2 more replies

senfiaj13h ago· 18 in thread

WalterBright11h ago

Wonder no longer!

https://dlang.org/spec/arrays.html#dynamic-arrays

and

https://dlang.org/spec/arrays.html#strings

and for C:

https://digitalmars.com/articles/C-biggest-mistake.html

maxlybbert9h ago

It's definitely possible. And common, at least in some projects. The only real drawback is that sloppiness will lead to multiple slightly different nonstandard string types in the same project.

bnolsen11h ago

That's called a fat pointer. Null terminated c strings is the majority of memory errors out there.

GalaxyNova13h ago

Yes I have seen it happen a few times with `strlen` being called in a loop silently causing O(N) to turn to O(N^2)

jkrejcha11h ago

Reminds me of an article[1] that described how he cut GTA Online loading times by 70% because strlen was getting called for effectively every character in a string

[1]: https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...

sweetjuly11h ago

senfiaj12h ago

sgerenser10h ago

Joel Spolsky coined the term “Shlemiel the Painter’s Algorithm” for this type of thing back in 2001: https://www.joelonsoftware.com/2001/12/11/back-to-basics/

none_to_remain12h ago

kgeist11h ago

senfiaj12h ago

lelanthran4h ago

> Well, not saying to always use it, but if the string size is big enough, the overhead of 2 ints becomes relatively vanishing.

In that case, the fix is not to change C strings (breaking a lot of existing code), but to introduce a stringbuilder type.

1 more reply

ekaryotic2h ago

chiph12h ago

Pascal did/does this, but eventually someone wants a string longer than the size portion can handle. Or wants the number of characters not the number of bytes.

jerf11h ago

pjmlp6h ago

And then anyone that isn't stuck in 1976 will use open arrays.

Johanx6410h ago

Dude, every sane language out there does this. Just generally with 4byte prefix. Null-terminated stuff has always been backwards compat stuff.

Pascal strings - historically and why people even remember this being an issue - were up to 255 chars in size, if not you had to use different string type.

MBCook11h ago

A lot of them are strings coming from or going to user space right? So wouldn’t you have to do constant conversions?

Animats11h ago· 5 in thread

This is a job for Claude!

What happens if you turn a job like that over to Claude Code? A mess? Good results? Code bloat? Worth trying on existing C programs.

qarl10h ago

I ran a test where I added a "light" mode to xscreensaver: unique changes to over 270 different C programs.

It mostly did an amazing job in a short period of time.

EDIT: Of course I get downvoted for saying this. HN isn't interested in reality any more.

krupan9h ago

These stories with no details and no proof are not interesting or helpful.

qarl9h ago

My apologies. I did not want to be downvoted for promoting my own material.

https://github.com/qarl/qscreensaver

1 more reply

ninjin9h ago

> Of course I get downvoted for saying this. HN isn't interested in reality any more.

qarl9h ago

I thought automation would be interesting to HN - given the context and the fact it was not used.

2 more replies

naturalmovement14h ago· 4 in thread

A reminder that we've had strlcpy[1] for ~ 30 years but it was never accepted into the Linux world because of typical petty open source bullshit. This is why we can't have nice things.

[1] https://man.openbsd.org/strlcpy

ericbarrett13h ago

The Linux kernel had strlcpy over 20 years ago. It was removed in favor of strscpy because the latter was judged a better interface. Here's a 2022 article: https://lwn.net/Articles/905777/

avadodin6h ago

Returning an error is better but you're using ssize_t which is a tradeoff.

The only useable C string API has always been memcpy anyways.

BoingBoomTschak14h ago

Actually, glibc 2.38 has it.

naturalmovement13h ago

Wow it only took them 26 years to import a 30 line C function, a third of which is comments?

I should have sent them a nice fruit basket to commemorate the occasion.

qarl11h ago· 4 in thread

Am I going to be the first person to ask this after five hours? Really?

Wouldn't this work be extremely easy to implement with an LLM coder?

qustio9h ago

I don't think the bottleneck was that it took six years to Ctrl-F strncpy and type in new code for each file.

qarl9h ago

It's a shame you're misrepresenting what is actually going on.

In another comment here I explained that I have run a test: asking Claude Code to add a substantial feature to 270 different C programs.

Despite your beliefs - it went extremely well.

qustio9h ago

Huh, are you confusing me with someone else? I don't doubt Claude Code did that, I do the same for refactors all the time.

Not to mention the coordination problem to get every maintainer on board and patches approved for each specific area when working on a project of that scale, even for a relatively narrow change.

Claude Code doesn't really help with that so don't see why the expectation would be a significant speed up (and doing it all in a single patch would definitely be rejected).

1 more reply

lelanthran3h ago

> In another comment here I explained that I have run a test: asking Claude Code to add a substantial feature to 270 different C programs.

That's a different scenario, though.

In terms of tokens alone, that would have been cost-prohibitive. But lets assume that you had the money to do this: it still might not even be possible.

You're confusing "I have these 270 independent programs and want to make this change to all of them" with "I have these 270m lines of code, of which only 270 needs to be changed".

larodi15h ago· 3 in thread

Wonder when is someone going to brave and fork the linux kernel and try to ffwd it with automatic programming.

fragmede15h ago

literalAardvark14h ago

It's much easier to use something with all the edge cases already handled as a starting point.

convolvatron14h ago

unfortunately as time goes by, the linux api surface gets larger and more convoluted. so there's going to be some coverage you're just never going to get.

but in the abstract, definitely. linux is so bloated at this point that its not clear that it can ever be 'made safe'.

thiht56m ago

What a nightmare, does it have to be so convoluted?

sirwhinesalot6h ago

Alas, didn't make it to C11.

We're now in C23, still nothing. But we did get _Generic and VLAs! Party hard.

5 more replies

rswail5h ago

Things that have bugged me for 40 years...

* NUL terminated strings (and now, non UTF-8 encoded strings on input/output)

4 more replies

WalterBright11h ago

Huh. Whenever I've been asked to review C code, I always looked for strncpy and always found a bug with it.

lambdaone12h ago

2 more replies

twothreeone10h ago

3 more replies

cm21878h ago

A lot of pain and suffering to avoid having a string datatype.

2 more replies

GTP1h ago

2 more replies

stcg2h ago

kstenerud3h ago

strncpy is 99.999% of the time NOT the correct function to call, so this is a huge win.

It's just a shame that such a confusing name was chosen for such a niche use case (fixed width records that require null padding).

rswail4h ago

In all the comments in this thread it's interesting how people confuse:

* NUL: An ASCII non-printing character with the byte value of 0

* NULL: A pointer that does not point to usable memory with the value that compiles in C to be equal to ((void *) 0).

1 more reply

PlunderBunny15h ago

I’m not sure why this was - the source base was so old it might have had its origins in Pascal struct behaviour.

3 more replies

D-Coder12h ago

Note that "360 Patches" is 360 uses of strncpy that have been removed, not necessarily bugs.

1 more reply

DerSaidin8h ago

strtomem_pad seems redundant with memcpy_and_pad, and also it requires the preprocessor: https://github.com/torvalds/linux/blob/1a3746ccbb0a97bed3c06...

I was curious: Why have it, instead of just using memcpy_and_pad?

Interesting to learn about the __nonstring attribute:

https://github.com/torvalds/linux/blob/1a3746ccbb0a97bed3c06... https://github.com/search?q=repo%3Atorvalds%2Flinux+__nonstr...

devsda11h ago

Did anybody else misunderstand the title as removing strncpy func for linux users ?

For a moment, I misunderstood it as (g)libc removing strncpy and was worried about the trouble its going to cause.

jibal10h ago

I started warning my colleagues against using it the moment I saw it for the first time about 50 years ago.

1 more reply

pjmlp6h ago

Now lets put that work into money, to assert what was the cost impact of replacing strncpy().

j / k navigate · click thread line to collapse