Git's list of banned C functions (opens in new tab)

(github.com)

876 pointsmuds5y ago613 comments

613 comments

192 comments · 40 top-level

paultopia5y ago· 78 in thread

Its really wild, as a person coming from other languages who has written maybe ten lines of C in his life that the functions that seem to be massive footguns in C are, like, "format a string" or "get time in GMT." That's... really scary.

jchw5y ago

Unfortunately, much of the pain with C surrounds dealing with strings. It’s been a bit of a theme on Hacker News for the past few days, but it’s actually a pretty good spotlight on something I feel is not always appreciated - strings in C are actually hard, and even the most safe standard functions like strlcpy and strlcat are still only good if truncation is a safe option in a given circumstance (it isn’t always.)

(~~Technically~~ Optionally, C11 has strcpy_s and strcat_s which fail explicitly on truncation. So if C11 is acceptable for you, that might be the a reasonable option, provided you always handle the failure case. Apparently, though, it is not usually implemented outside of Microsoft CRT.)

edit: Updated notes regarding C11.

WalterBright5y ago

Whenever I review C code, I first look at the string function uses. Almost always I'll find a bug. It's usually an off by one error dealing with the terminating 0. It's also always a tangled bit of code, and slow due to repeatedly running strlen.

But strings in BASIC are so simple. They just work. I decided when designing D that it wouldn't be good unless string handling was as easy as in BASIC.

5 more replies

InvOfSmallC5y ago

I teach at university as external lecturer. Teaching strings in C is the hardest thing I have to do every time. The university decided to explain C to first year student without previous experience. My feedback was to do a precourse in Python to let them relax a bit with programming as a concept and then teach C in a second course.

14 more replies

masklinn5y ago

> Technically C11 has strcpy_s and strcat_s

"Theoretically" is the word you're looking for: they're part of the optional Annex K so technically you can't rely on them being available in a portable program.

And they're basically not implemented by anyone but microsoft (which created them and lobbied for their inclusion).

5 more replies

brmgb5y ago

The issue is pretending that C even has strings as a semantic concept. It just doesn't. C has sugar to obtain a contiguous block of memory storing a set number of bytes and to initialize them with values you can understand as the string you want. Then you are passing a memory address around and hoping the magic value byte is where it should be.

C is semantically so poor, I find it hard to understand why people use it for new projects today. C++ is over complicated but at least you can find a good subset of it.

2 more replies

rbanffy5y ago

> strings in C are actually hard,

Strings in C are more like a lie. You get a pointer to a character and the hope there is a null somewhere before you hit a memory protection wall. Or a buffer for something completely unrelated to your string.

And that's with ASCII, where a character fits inside a byte. Don't even think about UTF-8 or any other variable-length character representation.

In fairness, the moment you realize ASCII strings are a tiny subset of what a string can be, you also understand why strings are actually very complicated.

1 more reply

mxcrossb5y ago

What I don’t understand is why C programmers use the built in strings. It’s like rolling your own sorting algorithm every time you need it. Surely someone could write a better string library in C that hides the complexity. The real problem is that C programmers are apparently allergic to using other people’s code.

5 more replies

swlkr5y ago

I'm partial to https://github.com/antirez/sds these days

1 more reply

macjohnmcc5y ago

strcpy is a coding challenge where I work for interviews. I typically ask them to write it as the standard version and ask them why they might not want to use it to see if they are aware of the risks. After that I ask them to modify the code to be buffer safe. And for those claiming C++ knowledge ask them to make it work for wchar_t as well to see if they can write a template. Some people really struggle with this.

lenkite5y ago

If only C had followed the Pascal way to have the size with a string - so much human suffering could have been avoided!

1 more reply

ryandrake5y ago

> ~~Technically~~ Optionally, C11 has strcpy_s and strcat_s which fail explicitly on truncation. So if C11 is acceptable for you, that might be the a reasonable option, provided you always handle the failure case.

One of the big problems with C programmers is they often neglect to check for and handle those failure cases. Did you know that printf() can fail, and has a return value that you can check for error? (Not you, personally, but the "HN reader" you) Do you check for this error in your code? Many of the string functions will return special values on error, but I frequently see code that never checks. Unfortunately, there isn't a great way to audit your code for ignored return values with the compiler, as far as I know. GCC has -Wunused-result, but it only outputs a warning if the offending function is attributed with "warn_unused_result".

I'm not a huge fan of using return values for error checking, but we have the C library that we have.

loeg5y ago

Truncation, even if it is wrong in an application logic sense, is strictly superior to UB (and in practice, buffer overruns, which can be exploitable). That's the main benefit of strlcpy/strlcat. It is certainly possible to construct a security bug due through truncation! But it is much more common to have security bugs from uncontrolled buffer overruns.

liuliu5y ago

Yeah. I just avoid str manipulations in general in C and when I have to, fuzz it ... (but still, the perf cliff is definitely new to learn in the past few days).

munchbunny5y ago

The decision to make C strings null terminated with implied length instead of length + blob continues to trip us up, 30+ years later. There's a good reason the "safe" versions of those functions all take length parameters. But way back when this approach was chosen, I don't think the state of the art could fully predict this outcome.

But also, "strings" and "time" are actually very complex concepts, and these functions operate on often outdated assumptions about those underlying abstractions.

kiwidrew5y ago

I would argue that C's fundamental mistake (well, more like limitation due to hardware of the time) was allowing arrays to decay to pointers; arrays hold valuable type information (the length!) that is lost once converted to a pointer.

C99 came so very very close with VLAs. You can declare a function like:

  int main(int argc, char *argv[argc]) { ... }

But C99 requires the compiler to discard the type annotations and treat the declaration as equivalent to:

  int main(int argc, char **argv) { ... }

Imagine a world where the C string functions were declared as:

  char *strndup(s, n)
    const char *s[n];
    size_t n;
  {
    /* now we can do sizeof(s) and bounds checking! */
  }

(You'd have to use K&R style declarations to get around the fact that the pointer argument comes before the length argument, alas.)

Edit: and then C11 made VLA support optional, since the feature didn't get used much, because the feature was only half-baked to begin with... sigh.

4 more replies

retrac5y ago

For reasons that were never clearly articulated, the prefix approach was considered odd, backwards, and to have numerous downsides, at least where I learned C. In hindsight, I can only cringe at that attitude. Strings as added in later Pascal, about 40 years ago now, were memory safe in a way that C strings still are not.

7 more replies

Blikkentrekker5y ago

> But also, "strings" and "time" are actually very complex concepts, and these functions operate on often outdated assumptions about those underlying abstractions.

Even in safer languages such as Rust, there are often quæstions as to why certain string operations are either impossible, or need to be quite complicated for a rather simple operation and are then met with responses such as “*Did you know that the length of a string can grow from a capitalization operation depending on locale settings of environment variables?

P.s.: In fact, I would argue that strings are not necessarily all that complicated, but simply that many assume that they are simpler than they are, and that code that handles them is thus written on such assumptions that the length of a string remain the same after capitalization, or that the result not be under influence of environment variables.

1 more reply

m4635y ago

I think 30-40 years ago it was perfectly appropriate to null-terminate strings. Every byte actually counted.

I remember thinking about setting the high bit to denote the end of string to save space.

Nowadays the binary for "hello world" might be as big as a whole operating system of the past.

(though honestly I can't recall the size of the OS on a boot floppy, but the original floppies were 160k)

1 more reply

jrimbault5y ago

30+ years -> 50+ years

Funny mind thing to forget to increment counters each year.

1 more reply

kazinator5y ago

The reason that the safe functions take length parameters is that they produce a new object in uninitialized memory, a pointer to which is specified by the caller.

It has nothing to do with null termination.

And that uninitialized memory is not self-describing in any way in the C language. Which is that way in machine language also.

This is a problem you have to bootstrap yourself somehow if you are to have any higher level language.

The machine just gives you a way to carve out blocks of memory that don't know their own type or size. C doesn't improve on that, but it is not the root cause of the situation. Without C, you still have to somehow go from that chaos to order.

Copying two null terminated strings into an existing null-terminated string can be perfectly safe without any size parameters.

   void replace_str(char *dest_str, const char *src_left, const char *src_right);

If dest_str is a string of 17 characters, we know we have 18 bytes in which to catenate src_left and src_right.

This is not very useful though.

Now what might be a bit more useful would be if dest_str had two sizes: the length of string currently stored in it, and the size of the underlying storage. This particular operation would ignore the former, and use the latter. It could replace a string of three characters with a 27 character one.

coliveira5y ago

Null terminated strings are remnants of an era when computers had little memory available. So, at the time it seemed smart to discard the length field and use a single byte-sized terminator (null). If you are writing an operating system for a machine with little memory to spare, this seems like a good decision. Of course things are very different now when memory is not a problem and the goal is safety.

2 more replies

frob5y ago

As someone who learned C as their first language, strings in every single language after that have felt like cheating.

"What? You mean I can type an arbitrary string and it works? I don't need to worry about terminators or the amount of memory I've allocated? You can concatenate two strings with +?!? What is this magic?"

unbalancedevh5y ago

It always makes me wonder if there's some hidden overhead that I'm absorbing. When I program in C I feel like I know a lot better what the generated instructions will be. Using higher-level languages for embedded programming where resources are tight makes me uncomfortable.

6 more replies

macintux5y ago

Yeah, every time I decide to play with C for nostalgia's sake, I immediately get hung up on just how painful everything is, especially strings.

I still love C, but I'd do my best not to have to write anything serious with it again.

cesaref5y ago

I think the key is to understand the historical context of C, what it was competing with, and what concerns people writing C had.

Compared to the alternative (straight assembler) at the time as a systems programming language, C is a massive step up.

Also, the UNIX way was independent processes, so the APIs did not need to be thread safe, as there was no threading in the target architectures.

Now given the massive amount of existing C out there from the time of such architectures, you either have to move the API and language on to make it incompatible with existing code, or support the old baggage. The language has kept compatibility, and in this case, the github peeps have deprecated APIs using macros, so it's a reasonable approach.

An alternative approach would be to move the language on, but by it's nature it won't be compatible with C, so you give it a new name. You call it things like go, or rust, or swift. These are all C with the dangerous bits removed. It'll be interesting in 40 years time to see if people are having the same conversation about these languages - 'OMG, how did people write stuff in rust? It can't cope with [insert feature of distributed quantum computing]. It's really scary'

johnnycerberus5y ago

I wouldn't say that Go is an alternative approach. I mean, what's the difference between Go and Java AOT with Graal? But Rust is truly an alternative to C/C++.

2 more replies

da39a3ee5y ago

This is git, not github.

cperciva5y ago

A better way of looking at it is that functions which expose very simple operations were among the first ones to be placed into the standard library -- and consequentially are the least well thought out.

IgorPartola5y ago

This is a lot like how in JavaScript you have footguns like the with statement or in Python 2 where you have Unicode issues, etc. I am sure we could definitely a new C standard that excludes these functions as obsolete, but the linked header file is a pretty sensible interim solution. C is an old language and it’s kind of amazing that code written 30 years ago can still by and large be compiled by a modern compiler. Ever try to run 3 year old React projects using today’s React? :)

ggregoire5y ago

> in JavaScript you have footguns like the with statement

I've been coding in JS on a daily basis for more than 10 years and today I learned there is a `with` statement in JS.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

Edit: well, seems like it's been deprecated/forbidden since ES5 (2009), so it makes sense I've never seen it.

3 more replies

viklove5y ago

It amuses me that HN hates JS so much, that even a topic about problems with C turns into a JS-bashing thread.

Also, I just want to remind you that JS isn't just React. There are plenty of libraries written in C that introduce breaking changes over the course of 3 years. Nothing will stop people from finding ways to complain about JS though, I know. The hate-boner is very real.

4 more replies

detaro5y ago

Because individual libraries choosing to change quickly is comparable to language stability how? The relevant comparison would be "run a 3y old react app (or a 20 year old website using JS) in a modern browser or interpreter"

2 more replies

rtpg5y ago

The string stuff is kind of the original sin, but to be honest almost all programming environments have massive footguns when it comes to times/dates. Python's datetime story is _extremely_ painful to deal with. Try doing .... I dunno, anything apart from getting the current time and doing an ISO format of a Javascript Date object.

I think stuff has kinda gotten better, but while Unicode had emoji to kinda save the day, dates never had this moment and we're still suffering through major messes on a daily basis because of it.

AaronFriel5y ago

Python's dates are very unlikely to cause quadratic or exponential performance dips, segfaults, or remote code execution vulnerabilities. (And JS now has Date#toISOString, since ES5.)

C's string manipulation functions are a regular source of the worst vulnerabilities in software.

Even if they're in the same category of legacy cruft, they're not even remotely in the same magnitude of consequences.

1 more reply

ironmagma5y ago

Yeah, there is a culture of complacency in C probably owing to the enormous historical baggage of legacy code that has to be supported and the blurred line between stdlib and system call.

freedomben5y ago

I disagree completely. Devs who use C are the least complacent about security in my experience. The problems are from previous eras before they knew about many of these things. A ton of people in modern languages couldn't name a single dangerous function, though they do exist in every language. You'd be amazed at how many race condition vulns result from TOCTOU errors just in authentication, or checking for the existence of a file before opening it, etc.

It's absolutely true that decades ago the C community was complacent, but it's not true now. Source: I taught secure coding in C/C++ in the 00s.

1 more reply

dangerbird25y ago

It's not really complacency: it's that the standard library is intentionally minimalistic to maintain portability and backwards compatibility. If you want sensible string handling, it's usually best to use a high level utility library like GLib(https://developer.gnome.org/glib/stable/) or Apache Portable Runtime(http://apr.apache.org/), or roll your own safe string type (preferably non-null terminating)

3 more replies

dangerbird25y ago

c standard library doesn't really relate directly to system calls (at least in modern os'es). In particular, the stdio.h functions are buffered by default, while their system call analogues are not. For unixes, system call wrappers are typically found in <unistd.h>, not the "official" c standard library

Spivak5y ago

I mean on Linux you're not encumbered by this because the syscall api is stable but in practice most GNU/Linux distros assume glibc. You can't correctly resolve a hostname on Linux without farming out to glibc -- hell even the kernel punts to userspace for dns names but you can technically ignore it if you want.

On BSDs and macOS you're always SOL because the syscall api isn't stable and only the C wrappers are.

beej715y ago

While it's true that there are a lot of unsafe functions in C, it's not really a mistake. C is a fundamentally unforgiving language. You just have to accept the fact you're driving a naked supercar with no seatbelts.

It's easy to survive: just don't crash. :)

And, functions aside, it's trivial to write a C program that bombs out without calling any functions at all, safe or otherwise.

It's a language from a different era, for sure. Back then no one had the computing power to build Rust. And remember that before C, they were writing Unix in assembly language. So sprintf() was a big step up!

matheusmoreira5y ago

Yeah, because of NUL-terminated strings. They cause so many problems it's not even funny. Even something simple like computing the length of the string is a linear time operation that risks overflowing the buffer. People attempted to fix these problems by creating variations of those functions with added length parameters, thereby negating nearly all benefits of NUL-terminated strings.

Why can't we just have some nice structures instead?

  struct memory {
      size_t size;
      unsigned char *address;
  };

  enum text_encoding { TEXT_ENCODING_UTF8, /* ... */ };

  struct text {
      enum text_encoding encoding;
      struct memory bytes;
  };

All I/O functions should use structures like these. This alone would probably prevent an incredible amount of problems. Every high-level language implements strings like this under the hood. Only reason C can't do it is the enormous amount of legacy code already in existence...

guerrilla5y ago

That would be nice. You hit on the other hell with C strings: modern encodings where wchar_t and mb* are useless and replacements essentially don't exist yet with char8_t, char32_t etc. Then there's the locale chaotic nonsense [1]. A new libc starting fresh would be nice.

1. https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...

1 more reply

SavantIdiot5y ago

If you list the languages you use, I'd be happy to point out the "footguns" in each of them. For all the warts on C, there really is no language that can compete for what it has accomplished over ~50 years.

Recall that during the rise of C, people were writing machine code on punch cards. Assembly -> Machine code has far more footbullets than C, it is a tradeoff between hand holding and tiny fast code.

Wow, this blew up.

To all the people popping off about how great other languages are, tell me: when will we see the Unreal Engine written in Python, or Pascal, or Algol, or Rust, or Go... the next big step is WebASM (or .cu), and that's way more footbullet-y than C. And what is the native language all of your sub-30 year old interpreted languages were written in? Thank you!

atoav5y ago

Yeah there are footguns in every language. But this is not a boolean question about the presence of footguns, this is about how much one has to know to be able to handle a language safely.

I know C/C#/Python/Rust/Javascript.

After a decade of using C I am still not totally sure if I didn't dangle a pointer somwhere in precisely the wrong way to create havoc. And yeah, that means I have to get better, etc. But that is not the point. The point is, that even with a lot of experience in the language you can still easily shoot yourself into the foot and don't even notice it.

Meanwhile after a month of using Rust I felt confident that I didn't shoot myself in the foot, because I know what the compilers e.g. ownership guarantuees. While in C shooting myself into the foot happen quite often in Rust I would have to specifically find a way to shoot myself into the foot without the compiler yelling at me, and quite frankly I havent found such a way yet.

Javascript is odd, because the typesystem has quite a few footguns in it. This is why such things like Elm or Typescript exist: to avoid these footguns.

I don't want to take away from the accomplishments of C, and I still like the language, but to claim it is equally likely in all languages to shoot yourself into the foot is not true.

eschaton5y ago

This is a grossly inaccurate description of computing at the time of the rise of C. C was competing with Pascal/Modula, BLISS, PL/I, BCPL, and so on, not assembly on punched cards.

The “C competing with assembly” meme was very specific to microcomputer game and operating system development, not more general microcomputer application development, and not to minicomputer or mainframe development.

1 more reply

cygx5y ago

Recall that during the rise of C, people were writing machine code on punch cards.

Or Fortran, Algol, Lisp, Cobol, Basic, Pascal, ...

Gibbon15y ago

My favorite assembly foot gun was a guy I worked with had a cute routine. You had a call to the routine, followed by a null terminated string after that. The routine would spit the string to the terminal. And then return to the location after the string.

He had some bug where in one place it returned to the start of the string, executed it, and kept going. The end result just happened to be a nop. Had been like that in production for a couple of years.

1 more reply

int_19h5y ago

Consider the fact that Simula-67, which predated C by 3 years, had classes and objects very similar to what Java offers (and then some - e.g. coroutines), and a built-in string processing library that used object-oriented syntax.

The reason why C won had little to do with its advantages as a language over the competitors. It just happened to be the systems language for Unix, which was the winner in the early OS wars on microcomputers (for unrelated reasons). Once it became so established, there was a positive feedback loop: you would write portable code in C, because you knew that it was the fastest language that most platforms out there would support. And then any new platform would offer a C compiler, because they wanted to be able to run all the existing C code out there. And so, here were are.

samatman5y ago

Your edit really isn't helping your case.

Those of us who have always known about less dangerous 'system' languages (Pascal probably being the most popular) lament the fact that so much code got written in C instead.

It wasn't inevitable. It was preventable! It just didn't happen that way for reasons which are largely historical.

I don't work for the Rust Evangelism Strike Force, my main project is written in (as little) C (as possible), but I beg anyone who has a choice: use something else! Rust is... fine, Zig is promising. Ada still works!

Writing out the set {Python, Pascal, Algol, Rust, Go} tempts me to say uncharitable things about your understanding of the profession, but I accept you were just being snarky so I'll just gesture in the direction of how $redacted that is.

badsectoracula5y ago

> when will we see the Unreal Engine written in...

Why would a huge C++ (not C, btw) codebase with roots going back to the 90s be rewritten in any other language?

And in fact how is the language Unreal Engine written in relevant to C having footguns?

maerF0x05y ago

Not that I dont believe there are any, but I'd love to hear your perspective...

Go (golang)

3 more replies

rodgerd5y ago

There's far more critical code in the world running on COBOL and s3[79]0 assembler. COBOL is vastly more important than C.

4 more replies

tester7565y ago

Just because I want to know people opinion's (+they may be more than happy to shit on C#:)

But please, nothing about using unsafe.

kazinator5y ago

gmtime is just not thread-safe that's all, since it returns a static structure; gmtime_r is not banned.

syncsynchalt5y ago

Thanks, I am now a decade out of the C game and I was wracking my brain on what the problem with gmtime would be. My best guess was dodgy is_dst portability /shrug

1 more reply

throwaway092235y ago

Many of C's problems relate to string handling. These are all legacy functions which have been replaced with safe alternatives many decades ago.

strcpy() was replaced with a safer strncpy() and in turn has been replaced with strlcpy().

The list is a ban of the less safe versions, where more modern alternatives exist.

_kst_5y ago

strncpy() is not a "safer" strcpy(). It can avoid some errors involving writing past the end of the target array (if you tell it the correct length for that array), but it's not a true string function, and it can leave the target unterminated and therefore not a valid string.

http://the-flat-trantor-society.blogspot.com/2012/03/no-strn...

3 more replies

cestith5y ago

Still, unless you're writing something that has to be very low-level all the way through, it's better to use a string-handling library than the stdlib tools for strings.

1 more reply

Kaze4045y ago

Why are these functions deprecated in favor of others but not removed? I know in Javascript this can happen so as to not break older websites, but in a compiled language this shouldn't be a problem right?

7 more replies

pix645y ago

strlcpy() isn't standard. You have to provide your own implementation if you want your code to be portable.

1 more reply

ChrisLomont5y ago

These still lead to lots of bugs via off by one errors on lengths or other buffer misuse.

anikom155y ago

What’s scary is programmers assuming any function as being safe. C programmers don’t trust anything, and they’re better programmers for it.

jhanschoo5y ago

This statement is not even wrong. Good programmers of any language are aware of the footguns in their language and the things their compilers assume. Bad programmers don't.

C has unsafe basic functions because the programs written then were much simpler, and this sufficed. There's decades of PL research resulting in new languages that give better guarantees than C, allowing you to worry less about wrestling with the language and more on your business logic.

> C programmers don’t trust anything, and they’re better programmers for it.

By that dime, frontend JS programmers trust things even less than C programmers, and they're even better programmers for it. \s (in reality, FE JS devs mainly wish that browser environments were more consistent and predictable, and would disagree that they are better developers because of it).

jxy5y ago

That's an unfortunate result of backward compatibility. If it were Python, it would just become v2 and tell people to suck it up.

oleganza5y ago

Notice that it's a giant PITA to work with any variable-length data. Because language lacks adequate means to abstract away safe fast memory access with generic types, RAII and borrow checkers. Comparing to C, both C++ and Rust (very different beasts) feel like pals of JavaScript: basic operations with dynamic strings and arrays just work™.

0xbadcafebee5y ago

Well that's input validation for ya. It's not enough to say "give me a string", or even "give me a file path", and then only check that it has ASCII characters. You have to validate that this input could conceivably be a file on this system that someone would use.

"../../../../../../../../../../../../../../../../../../../../etc/shadow" is not a file someone would ever reasonably want to access. But is there an easy way to look for nonsense paths without potentially limiting functionality, or writing more code than you wanted to? Nope.

The same footgun exists in all languages; C's design just has a hair trigger.

oleganza5y ago

That example is typical "confused deputy" security vuln irrespective of the language, not a "validate input" one. Meaning, the typical unix interface to the filesystem is such that it's hard/impossible to express "i'm only having access to this folder and interested in paths within that folder". `chroot` is too dramatic sandboxing that cannot be used for all use cases.

BTW, in macOS there are "secure bookmarks" (see NSURL docs) that are effectively capability tokens: when user drags a file, or selects it in an Open File dialog (which runs isolated from the app), the kernel creates an app-specific token that grants access to that file to the app, so it can access it beyond its sandbox.

Communitivity5y ago

I remember an entire lecture about the use and abuse of sprintf and related functions as a means of exploit. Yeah, when you delve into the internals of C you find things that are terrifying if you are concerned about reliability, security, or performance. The same is true though for many languages. The problem is, as is often the case, the Iron Triangle: good, fast, cheap - pick two. Different sections of the language are written by developers under different constraints and pressures, which leads to different choices. In my experience every language implementation has at least one area that was done quickly for expediency or done poorly because no one else was able to (or wanted to) work on it.

JdeBP5y ago

I was hoping that someone would have already pointed out that this is not quite correct phraseology, as the qualifier "old ways to" has to be inserted: e.g. "old ways to get time in GMT". The "new" ways have been around for nigh on 30 years in some cases, and aren't really "new" now. I've been using localtime_r() for almost that long, for example. Coming from another language, don't be fooled into thinking that what you are looking at in these lists is the current state of the art for the language.

bregma5y ago

C is a well-stocked kitchen of a language.

Unfortunately, it is riddled with sharp knives that can cut you, open flames that can burn you, gas that can smother you, water than can drown you and food that can make you sick if you prepare it incorrectly.

Some react to this potential safety threat by banning the use of knives, stoves, sinks, and food from a kitchen.

Fortunately most attempts at safety just require having a microwave to prepare the frozen pizza or Uber Eats delivery.

ChrisRR5y ago

Strings are half the reason I never recommend someone learns C as their first language. Python is much easier, and then you can pick up C afterwards when you've got a better baseline knowledge

I've seen the same question posted way too often by beginners to C. "I've created a char*. Why am I getting <random fault> when I try to write to it?"

huhtenberg5y ago

All this stems from C strings being zero-terminated.

This in turn stems from the dedicated CPU support for working with zt strings that traces all the way back to PDP-11. So what C does here is exactly what it has always been doing - it provides a thin wrapper of the existing hardware functionality.

The variadic arguments are of the same nature - they basically allow for manual call stack parsing, again something that is a level down from the application code.

It's also easy to see how an API like sprintf and scanf came about - someone's just got tired of writing a bunch of boilerplate code to print a float with N decimals aligned to the left with a plus sign. So they threw together a function call "spec" (the format string), added a call stack parsing support (va_args) and - voila - a beautifully concise print/scan interface. It is a very clever construct, you've gotta give it that.

The flip side that it required people to pay close attention how they use it, which wasn't that bold of a requirement back then. But as time went on, the average skill of C programmers went down, their use of the language did too, so more and more people started to step on the same rakes.

So, here we are. Zero-terminated strings are forbidden and va_args calls are nothing short of the magic.

klingon795y ago

I wonder if these headers were applied to the majority of C projects in, let’s say, all projects that were part of a Linux distribution- how much would fail to compile. My guess is: a lot.

Camillo5y ago

Many of the problems with C descend from a common root, the decision to use bare pointers (memory addresses) as the basic way to refer to strings, arrays etc.

If they had used a {pointer, size} pair instead, it would have avoided all of these string problems, most buffer overflows, even the GTA Online loading problem that was on HN recently.

cb3215y ago

For what it's worth, while what @Camillo says is both true and important, people usually do not mention the trade offs involved or why that decision was attractive at the time.

These days (ptr,size) is probably 16 bytes -- longer than almost all words in the English language (the scrabble SOWPODS maxes out at 15). A pointer alone is 8B. Back at the dawn of C in 1970, memory was 7..8 orders of magnitude more expensive than today..(about 1 cent per bit in 1970 USD). (Today, cache memory can be almost as precious, but I agree that the benefits of bounded buffers probably outweigh their costs.)

8B pointers today are considered memory-costly enough "in the large" that even with dozens of GiB machines common, Intel introduced an x32 mode to go back to 32-bit addressing aka 4B pointers. [1] There are obviously more pointers than just char* in most programs, but even so.

Anyway, trade offs are just something people should bear in mind when opining on the "how it should be"s and "What kind of wacky drugs were the designers of language XYZ on?!!?".

[1] https://stackoverflow.com/questions/9233306/32-bit-pointers-...

Animats5y ago

Pascal, which had sized strings, was in wide use before C. Many people, including Bill Atkinson, who wrote many of the original Macintosh applications, thought C was a step backwards.

Pascal, to save one byte, limited strings to length 255. Bad decision.

1 more reply

rightbyte5y ago

If they would have used a "fat strings" for the standard lib there would have been at least four different types by now with 8 to 64 bit lengths. Maybe even with signed char as length field on some systems, unsigned char on other. Or signed and unsigned for all int:s for a total of 8 types.

I think the sentinel character was the best choice in hindsight and at the time in that regard.

But I wish the xxx_s versions and strdup would have made it into the standard like 30 years ago.

1 more reply

HelloNurse5y ago

What you call "format a string" is actually "begin a perilous expedition into uncharted memory".

lmilcin5y ago· 16 in thread

To respond to some of the comments.

It is not that there is anything intrinsically wrong with these functions. You can technically use all of them and I have been using all of them, safely, for decades.

The issue is they are huge traps to the point that in a larger piece of software one can say "well, it's just not worth it".

You can go much, much, much further than that.

In couple embedded projects I worked some of the rules were:

* dynamic allocation after application has started is banned -- any heap buffers and data structures must be allocated at the start of the application and after that any allocation is a compile time error,

* any constructs that would prevent statically calculating stack usage were banned (for example any form of recursion except when exact recursion depth is ensured statically),

* any locks were banned,

* absolutely every data structure must have size ensured, in a simple way, beyond any reasonable doubt,

etc.

whatisthiseven5y ago

It is interesting to read the rules you came up with to limit memory usage, and then to think of the criticisms one gets in Java for limiting memory usage. In Java we try to limit new as much as possible to prevent the GC from pausing too much, or inconveniently, or for too long. And basically all the rules you say are what we also use in Java.

Except when you have these rules in Java, the ironic counter-point is "if you are doing this much memory control yourself, you should just use C or C++ or something".

I'll keep your comment in mind next time I see that rebuttal. Thank you.

lmilcin5y ago

Having almost 20 years of experience with Java... but are not following recent garbage collector developments.

There is a bunch of misconceptions about Java. Java is actually very performant and memory allocation is generally cheaper than in C (except for inability to have good use of stack in Java). What's slow about Java is all the shit that has been implemented on top of it, but that's another story for another time.

For example, allocation in Java is basically incrementing the pointer. And deallocation for most objects is basically forgetting the object exists.

No, you don't want to "limit the use of new", that's wrong approach.

What you want is to have objects that are either permanent or last very short amount of time.

The worst types of objects are ones that have kind of intermediate lifetime ie if they are allowed to mature from eden. These cost a lot to collect.

The objects that have very short lifetime are extremely cheap to collect.

So if your function takes arguments, creates couple of intermediate objects and then never returns them (for example they were just necessary for inner working of the function) and your function does not call a lot of other heavy stuff, then it is very likely the cost of those temporary objects will be very low. Also, they tend to be allocated very close to each other and so pretty well cached.

2 more replies

unnouinceput5y ago

Actually I experienced worse restrictions when I was at Siemens writing embedded. Expanding on your list, here are some extras:

- ternary operator("?") was strictly forbidden. One had to use full "if () {..}else {..}" syntax with comments inside each branch even if the branch was empty

- a dynamic array written in an abstract way, when used and implemented specifically for current project had to become a constant static one, with values precalculated and copy/pasted to current project source. This was a fun one to do maintenance work years later.

- magic numbers inside code was forbidden. All numbers had to be defined in a specific header, with explanation why is that number said value.

- no variable parameters. All functions to have fixed parameters

- use of macros as minimum as possible. Code review was wasted sometime on 50% time over use of macros that were not already "classic" from the project point of view

- operator overload strictly forbidden. Also overloading functions was forbidden too.

gpanders5y ago

All of these except for the first two seem like good rules in general. Was the ban on the ternary operator just a style/readability thing?

1 more reply

xondono5y ago

Anything enforcing MISRA has essentially (almost) no way of allocating memory at runtime.

watersb5y ago

MISRA: "Motor Industry Software Reliability Association"

https://en.m.wikipedia.org/wiki/MISRA_C

fsociety5y ago

It’s funny, I worked exclusively with MISRA at the start of my career. Eventually I started a job at a FAANG and received quizzical comments on why I implemented a memory arena.

The argument was to allocate memory freely and let it pool memory as necessary. Fair enough, it was simpler and fit the standard expectation of development.

The issue is that if you talk with the allocator team they complain of not being able to fix performance issues fast enough due to allocations firing off left and right in the middle of a request.

I never realized that my view of C programming is heavily influenced by MISRA until your comment.

I know game engine programming follows a similar, perhaps unspoken, convention.

2 more replies

closeparen5y ago

How often does the dynamic allocation rule lead to an ad-hoc allocator appearing inside the program?

Also doesn’t the OS lie? I thought the memory wasn’t really physically assigned until first use.

syncsynchalt5y ago

In my experience dynamic allocation is banned in either (a) small embedded environments or (b) high scrutiny environments (soft realtime, safety critical, etc).

In both cases the project size is small enough, or the scrutiny is high enough that the ad-hoc allocator doesn't develop. The environment is also simple enough that the memory cheats you're thinking of don't exist (or you can squash them by touching all allocated memory up front).

lmilcin5y ago

Why would you want to implement your own allocator?

The goal of these rules is to improve reliability and timeliness of your application. If you intend on working around those rules to do what the rules explicitly forbid then either you or the rules are wrong.

1 more reply

rightbyte5y ago

> How often does the dynamic allocation rule lead to an ad-hoc allocator appearing inside the program?

You could maybe call filescope buffers with an size counter a dynamic memory allocation. I.e. for storing RS232 or CAN messages. Since they shrink and grow.

The important thing is that you want to know that flooding one buffer wont flood another, which malloc could result in if it was used for unrelated buffers.

wongarsu5y ago

> Also doesn’t the OS lie? I thought the memory wasn’t really physically assigned until first use.

That depends on the OS. Linux lies (overcommits), Windows doesn't. In embedded it's more typical to have a special OS like VxWorks or FreeRTOS that don't lie to you, or to have no OS at all (like basically every arduino project)

1 more reply

xxpor5y ago

It's an embedded system, it's very likely there's no OS in the first place.

1 more reply

zwieback5y ago

The stack thing was always the big worry for me. Without a comprehensive static code analysis tool that's hard to do. And runtime stack checking adds quite a bit of overhead, especially if you also have to worry about running on the interrupt stack and possibly switching.

signa115y ago

> dynamic allocation after application has started is banned -- any heap buffers and data structures must be allocated at the start of the application and after that any allocation is a compile time error,

how do ensure that?

4gotunameagain5y ago

One way I can think of is to include the banned.h after you have performed all init processes

(It would have to be in the .c files, not the headers, might not be so clean)

captainmuon5y ago· 11 in thread

It would be interesting to see the rationale behind these bans, and what the suggested alternatives are. Some are obvious, like `strcpy`, but I can't remember what the problem with `sprintf` or the time functions are.

If you are doing something like `sprintf(buffer, "%f, %f", a, b)`, yes it is tricky to choose the size of buffer frugally, but if you replace that by `ftoa` and constructing the string by hand, you are likely to introduce more bugs.

Edit: as pointed out in another post, you can do git blame to see the rationale for each ban, quite interesing.

Aanok5y ago

The trouble with printf-family functions is their variadic nature. If the arguments don't match the format string, you can wreak all sorts of havoc.

A fun exercise you can do is put a "%s" in the format string, omit the string argument and see what happens to the stack.

anyfoo5y ago

That's however relatively easy to verify programmatically, and indeed any recent compiler will complain about that.

I'd say the usual trap is rather the size of the target buffer, because that requires bigger static analysis guns. (I'm ignoring things like "%n", because then you're playing with fire already.)

1 more reply

danaliv5y ago

There's that, but with sprintf/vsprintf specifically, there's no way to keep it from storing characters past the end of your buffer. For example:

    char buf[2];
    sprintf(buf, "%d", n);

This will happily write to buf[2] and beyond if n is negative or greater than 9.

dahfizz5y ago

This was my reaction as well. Banning strncpy just encourages haphazard manual copying.

smasher1645y ago

From the commit message:

If you're thinking about using it, consider instead:

  - strlcpy() if you really just need a truncated but
    NUL-terminated string (we provide a compat version, so
    it's always available)

  - xsnprintf() if you're sure that what you're copying
    should fit

  - strbuf or xstrfmt() if you need to handle
    arbitrary-length heap-allocated strings

1 more reply

asdfasgasdgasdg5y ago

I think you're meant to use snprintf instead. It would be great to see documentation on the alternatives!

azurezyq5y ago

maybe this https://github.com/git/git/blob/master/strbuf.h ?

ben_bai5y ago

strlcpy is the safe way, that is used by git.

syncsynchalt5y ago

strncpy doesn't do what you think it does (it is not analogous to strncat). strncpy does not terminate strings on overflow. In C terms, it is not actually a string function and shouldn't be named with `str`.

snprintf or nul-plus-strncat do what you want, but snprintf has portability problems on overflow. Most projects I've been on rely on strlcpy (with a polyfill implementation where not available).

1 more reply

monocasa5y ago

snprintf will always terminate the string, and won't overflow the buffer.

SloopJon5y ago

sprintf() warnings have gotten pretty sophisticated these days. I discovered GCC's -Wformat-overflow the other day. It complained that the buffer for a date string wasn't big enough; e.g., sprintf(buf, "%04d-%02u-%02u", year, month, day), where year, month, and day are 16-bit shorts, and buf was probably eleven or twelve bytes.

It may actually be a bug that I got the warning, because the range of each input was checked, and I think the compiler is supposed to be smart enough to remember that.

drfuchs5y ago· 9 in thread

It would be nice if the error messages generated would suggest replacement functions that they deem appropriate. I see that I'm not supposed to use gmtime, localtime, ctime, ctime_r, asctime, and asctime_r; but what do they think I should use?

cle5y ago

From the commit messages

> The ctime_r() and asctime_r() functions are reentrant, but have no check that the buffer we pass in is long enough (the manpage says it "should have room for at least 26 bytes"). Since this is such an easy-to-get-wrong interface, and since we have the much safer strftime() as well as its more convenient strbuf_addftime() wrapper, let's ban both of those.

(https://github.com/git/git/commit/91aef030152d121f6b4bc3b933...)

> The traditional gmtime(), localtime(), ctime(), and asctime() functions return pointers to shared storage. This means they're not thread-safe, and they also run the risk of somebody holding onto the result across multiple calls (where each call invalidates the previous result). All callers should be using their reentrant counterparts.

(https://github.com/git/git/commit/1fbfdf556f2abc708183caca53...)

drfuchs5y ago

Yes, but every hapless user shouldn't have to go searching through a bunch of commit messages to find the suggested replacement. Bad UX.

3 more replies

tinus_hn5y ago

Strangely there is no mention of strtok which has a similar issue.

chris_wot5y ago

The commits actually do give that info. Take for instance this commit:

https://github.com/git/git/commit/c8af66ab8ad7cd78557f0f9f5e...

It actually gives examples and a lengthy explanation and reasoning behind the ban.

mamon5y ago

But why put that info in commit message instead of a comment in the file itself?

2 more replies

xorcist5y ago

Now that's what a good commit message looks like!

1 more reply

colordrops5y ago

Also, why the functions are banned.

dev_tty015y ago

It would be even nicer if it redefined the call to a safe version and then generated a warning message informing the programmer of the substitution.

pjc505y ago

You can't do that because the semantics are different in most cases.

attractivechaos5y ago· 6 in thread

I wonder how they copy strings with strcpy and strncpy both banned. strlcpy? But it is not conforming to major standards. Or just memcpy with extra code?

dgentile5y ago

Edited: Looks like they have safe alternatives: "

  - strlcpy() if you really just need a truncated but
    NUL-terminated string (we provide a compat version, so
    it's always available)

  - xsnprintf() if you're sure that what you're copying
    should fit

  - strbuf or xstrfmt() if you need to handle
    arbitrary-length heap-allocated strings
"

lights01235y ago

https://github.com/git/git/commit/e488b7aba743d23b830d239dcc... Yes:

> we provide a compat version, so it's always available

attractivechaos5y ago

This gets me interested. Link [1] below shows their implementation of strlcpy(). This is a questionable implementation. With strncpy, the source string "src" may not be NULL terminated IIRC. The git implementation requires "src" to be NULL terminated. If not, an invalid read. EDIT: according to the strlcpy manpage [2], "src" is required to be NULL terminated, so strlcpy imposes more restrictions and is not a proper replacement of strncpy.

Furthermore, imagine "src" has 1Mb characters but we only want to copy the first 3 chars. The git implementation would traverse the entire 1Mb to find the length first, but a proper implementation only needs to look at the first 3 chars. So, they banned strncpy and provided a worse solution to that.

[1]: https://github.com/git/git/blob/master/compat/strlcpy.c

[2]: https://linux.die.net/man/3/strlcpy

3 more replies

syncsynchalt5y ago

strncpy is a dangerous function because it doesn't nul terminate on overflow. The danger is that it's named misleadingly (str* functions otherwise always work in nul-terminated strings).

(strcpy is just banned because there's no bounds check, and they want to force use of strlcpy instead).

loeg5y ago

strncpy also has the dubious behavior of zero-filling out to N even if strlen(src) is much shorter than N.

jabl5y ago

memccpy? Most platforms have it, and it's being added to C2X.

See https://developers.redhat.com/blog/2019/08/12/efficient-stri...

zbendefy5y ago· 6 in thread

Are there some details on whats wrong with these?

bvaldivielso5y ago

The commit messages that added them explain the reasoning

ufo5y ago

I wish they would have put that on comments instead of on the commit messages. It's not the first time that I've seen this particular list of banned functions being shared online and every time it happens someone has to explain that the most interesting info is hidden in the commit messages.

alexchamberlain5y ago

All the string functions have buffer overrun vulnerabilities if not used carefully. I'm not sure about the time functions though.

edflsafoiewq5y ago

The time functions are either non-reentrant, or, for the _r versions, have the same problem with buffer overruns.

https://github.com/git/git/commit/1fbfdf556f2abc708183caca53...

https://github.com/git/git/commit/91aef030152d121f6b4bc3b933...

trilinearnz5y ago

Very much this. I frequently write small games in C, and the number of times I have been bitten by baffling behaviour because a string somewhere was copied into an array that was too short, are many! Apart from that, I love the simplicity of the language and the stdlib, and it's definitely my preferred hobby programming environment.

It would be good to know what the commonly-accepted alternatives are.

csours5y ago

I'm pretty sure you could google each of these with the word 'dangerous'

For example: https://lgtm.com/rules/2154840805/

jancsika5y ago· 4 in thread

I love seeing "strncpy" right after "strcpy."

If someone wants some fun, try this:

1. Slurp up all the FOSS projects that extend back to 90s or early 2000s.

2. Filter by starting at earliest snapshot and finding occurrences of strcpy and friends who don't have the "n" in the middle.

3. For those occurrences, see which ones were "fixed" by changing them to strncpy and friends in a later commit somewhere.

4. See if you can isolate that part of the code that has the strncpy/etc. and run gcc on it. Gcc-- for certain cases (string literals, I think)-- can report a warning if "n" has been set to a value that could cause an overflow.

I'm going to speculate that there was a period where C programmers were furiously committing a large number of errors to their codebases because the "n" stands for "safety."

gilbetron5y ago

Meh, most of us understood the sharp edges of strings pretty well. Before, we'd check the len of strings before strcpy, strncpy let us do it without doing that, and just slap a 0 in if needed. Safe? No. Better? A bit. Do I ever want to do string manipulation again with C? Nope.

tomjakubowski5y ago

Understanding the sharp edges is one thing. Being able to avoid them in practice is another. The history of memory safety problems in C string handling, especially involving strcpy/strncpy, strongly suggests to me that they're unavoidable even for C programmers who are skilled, knowledgeable, and experienced.

1 more reply

commandlinefan5y ago

Ok, memcpy(dst, src, strlen(src)) it is then!

yetihehe5y ago

Yay for errors, it should be memcpy(dst, src, strlen(src)+1). Strlen doesn't count last 0. If your dst is not zeroed already you will have unterminated string.

1 more reply

bvaldivielso5y ago· 4 in thread

Ah this is a very good idea. I guess you still have to make sure that all your translation units include this header, which isn't completely foolproof.

Static analysis would probably be more robust, but way more involved.

radus5y ago

Best of both worlds: use static analysis to ensure the header is included?

koenigdavidmj5y ago

gcc has a -include option, so this can be done once in the Makefile and get the benefit everywhere (unless you’re being clever).

Athos_vk5y ago

I remember visual studio having an option to force include a file, surely something like that would exist for other toolchains

kccqzy5y ago

You don't need fancy static analysis. You can find out whether the banned functions are called just by inspecting the compiled object file. Add it to the build step and done.

ape45y ago· 3 in thread

Just replace strcpy(a,b) with strcpyn(a,b,INT_MAX)

/joke

fatnoah5y ago

I'm pretty sure I've seen similar logic in my life.

beeforpork5y ago

Been a while, eh?

It should be strncpy(a,b,(size_t)-1)!

guerrilla5y ago

SIZE_MAX does exist.

TheRealSteel5y ago· 2 in thread

I'm an idiot, I read the headline and thought these were banned from Git entirely. As in, you couldn't commit them to any repo using Git, at all. Thought that seemed a bit harsh.

Turns out you just can't use them when you contribute code to the Git project. That makes sense, and seems reasonable.

edgyquant5y ago

Critiquing poor code practices is beyond the scope of git at this time

TheRealSteel5y ago

Should be easy to implement, will have a pull request ready tomorrow.

Edit: wait, I can't use strcpy?! Screw that, then I'm not open sourcing my AGI!

EdSchouten5y ago· 2 in thread

Funnily enough, strtok() is not listed :)

raegis5y ago

This one has my vote for the weirdest library function ever.

rightbyte5y ago

The storing of state between calls is beautiful in all its wickedness.

abetusk5y ago· 2 in thread

The Git Mailing List Archive on lore.kernel.org (found in the README from the git mirror on GitHub) has more context [0] [1] [2]. From Jeff King on 2018-07-24:

  The strncpy() function is less horrible than strcpy(), but
  is still pretty easy to misuse because of its funny
  termination semantics. Namely, that if it truncates it omits
  the NUL terminator, and you must remember to add it
  yourself. Even if you use it correctly, it's sometimes hard
  for a reader to verify this without hunting through the
  code. If you're thinking about using it, consider instead:

    - strlcpy() if you really just need a truncated but
      NUL-terminated string (we provide a compat version, so
      it's always available)

    - xsnprintf() if you're sure that what you're copying
      should fit

    - strbuf or xstrfmt() if you need to handle
      arbitrary-length heap-allocated strings

I just did a search on the keywords 'banned' and 'strncpy' [2]

[0] https://lore.kernel.org/git/20180724092828.GD3288@sigill.int...

[1] https://lore.kernel.org/git/20190103044941.GA20047@sigill.in...

[2] https://lore.kernel.org/git/20190102093846.6664-1-e@80x24.or...

[3] https://lore.kernel.org/git/?q=banned+strncpy

js25y ago

Psst:

https://github.com/git/git/commits/master/banned.h

(Git development is done by emailing patches. Those patches include the git commit message, which we can see just by looking at the history of the file. Sometimes there's additional discussion on the ML, but the most important details are in the commit message because the git development team is very disciplined about that.)

abetusk5y ago

Ha, yep, whoops

Luyt5y ago· 2 in thread

It would be great if the BANNED() macro could suggest the correct function to use.

sedatk5y ago

The right function may change based on the use case, that's why they may not have wanted to suggest an alternative outright.

tinus_hn5y ago

You could send a pull request, it doesn’t seem too complicated to implement

maxk425y ago· 2 in thread

What would be helpful is an explanation of how each function ends up being misused so people can learn from this.

petters5y ago

Git blame is helpful here. See e.g.https://github.com/git/git/commit/1b11b64b815db62f93a04242e4...

jsmith455y ago

View the git history for the file. Each commit that adds functions has a detailed explanation of what is wrong with the functions.

at_a_remove5y ago· 1 in thread

I have only ever dabbled in C, just to look at other people's code and occasionally when I really needed speed, so I am at what I would call a "Pretty Pathetic" level, able to recognize that I am looking at C.

However, I look at old books on C, and then I look at this list, and I wonder if it would not have been helpful to, after mentioning that a function was banned, suggest what the replacement is, even as a comment.

syncsynchalt5y ago

You're not wrong. But a seasoned C developer looks at this list and nods along. (I'm a little out of practice, but I have war stories for most of these).

It's likely that the authors of this list didn't think the comments would be worthwhile for the audience (git developers).

moomin5y ago· 1 in thread

They should probably add sscanf.

ed25519FUUU5y ago

First thing I looked for. It looks like it was used here:

https://github.com/git/git/blob/master/object-file.c#L1293

And currently used here (at least):

https://github.com/git/git/blob/master/refs.c#L1235

xvilka5y ago· 1 in thread

I hope, one day to see it's rewritten in a safer language.

qbasic_forever5y ago

There's a nice Go implementation of git: https://github.com/go-git/go-git

kgrimes25y ago· 1 in thread

Can a C guru provide a TL;DR of why these are bad?

syncsynchalt5y ago

    - strcpy: no bounds check
    - strcat: no bounds check
    - strncpy: does not nul-terminate on overflow
    - strncat: no major issues, probably to force usage of strlcat
    - sprintf: no bounds check
    - vsprintf: no bounds check
    - gmtime: returns static memory
    - localtime: returns static memory
    - ctime: no bounds check
    - ctime_r: no bounds check
    - asctime: returns static memory
    - asctime_r: no bounds check

The str functions all have safer alternatives. The time functions have reentrant alternatives, and/or alternatives that provide a bounds check.

sys_647385y ago· 1 in thread

getc?

syncsynchalt5y ago

`gets` would be the ultimate banned C function, I suspect nobody thought it was worth spelling out though.

StillBored5y ago

These functions are one of the many reasons why I tend to have a C with some C++ classes dialect I use in my own projects.

std::string needs some tweaks, but it can mostly be treated as a built in and it wipes out a huge set of C string issues.

shadowgovt5y ago

To its credit, it's convenient that the C pre-processor is so powerful that it facilitates baking a "C the good parts" concept directly into the compilation process.

1337_d00dZ5y ago

In compilers that implement GCC extensions (such as Clang), you can use the "poison" directive to achieve the same effect (but with a better error message):

#pragma GCC poison printf sprintf fprintf

[0] https://gcc.gnu.org/onlinedocs/gcc-3.2/cpp/Pragmas.html

DyslexicAtheist5y ago

Some functions are missing which would normally cause a warning with most linters and static security analysis tools (e.g. the atoX family, mktemp, etc ...). Problem is most people I know don't run external linters (maintaining good linting rules is hard to scale in larger projects and in my >3 decades of writing C only few companies[0] I've seen managed the linting rules as part as their "definition of done").

While I think such rules are a good idea it only makes sense if it is done consistently and depends on how religiously the tooling (duct-tape and "process") enforces them (even so, you're still only one `#ifdef` away from undoing that "safety"). Having GCC[1] now support static analysis is a killer feature for this type of problem.

On the other end of the spectrum we have Huawei which instead of linting their code is finding creative ways to trick auditing tools and hide such warnings from auditors:

[0] https://news.ycombinator.com/item?id=22712338

[1] https://developers.redhat.com/blog/2021/01/28/static-analysi...

[2] https://grsecurity.net/huawei_and_security_analysis

zX41ZdbW5y ago

The similar list from ClickHouse repository: https://github.com/ClickHouse/ClickHouse/blob/master/base/ha...

userbinator5y ago

At least they didn't ban memcpy()...

Much like with all other forms of effective censorship, I see this as a quick short-term "fix" with hidden long-term costs[1]. IMHO this sort of anti-thinking just leads to even worse, more dogmatic and cargo-cult, programmers who know less and less about the basics and then go on to make even more subtle errors.

Somehow the collective software industry has managed to propagate the notion that people are incapable of doing even basic arithmetic. Yet they think people are capable of creating complex systems with even more subtle behaviour? The justification would normally be because it's not directly affecting security. WTF. It's beyond stupid.

The only C function I think should be truly banned is gets(), because it is actually impossible to calculate what size of buffer it needs. That is not true of any of the others on this list.

[1] By short and long, I mean decades vs centuries.

synergy205y ago

OK, so no strncpy, strncat etc, what are the alternatives used in git then? I'm a long-time C coder but I do not know what will be used to replace strncpy/strncat and all those gmtime/localtime/ctime/asctime.

snvsn5y ago

Previous discussion: https://news.ycombinator.com/item?id=20792938

shaggie765y ago

Our forbidden functions header is similar; it's got about 30 functions including most of the str* family to enforce the use of our safer versions.

robinduckett5y ago

Is there no linting software that can catch these kinds of issues? Like using strlen with sscanf like I've been hearing about lately?

whydoyoucare5y ago

I am so thankful git isn't forcefully including this header in every C language project and that we have a choice when using git! :-)

matt-attack5y ago

I used C many years ago so I’m quite out of it. What are the replacements for these? I would have thought these were all necessary.

malkia5y ago

Let's use a loophole ;) - (strcpy)(a,b)

totorovirus5y ago

Now I am getting really curious whether other companies with supposedly strong engineering knew about sscanf issues.

Animats5y ago

About 20 years too late. Those should have been moved to a "deprecated" header file decades ago.

amir734jj5y ago

Maybe instead of just writing a banned message, it should be the name of alternative function to use.

lerax5y ago

Yes, this is right. Any C decent programmer knows that functions are cursed.

jll295y ago

gets() and scanf() should be on that list due to potential buffer overflow.

anovikov5y ago

why is strncpy banned? what's wrong about it?

sys_647385y ago

scanf?

known5y ago

Just banning is not fair; Include the alternatives;

j / k navigate · click thread line to collapse

613 comments

192 comments · 40 top-level

paultopia5y ago· 78 in thread

jchw5y ago

edit: Updated notes regarding C11.

WalterBright5y ago

But strings in BASIC are so simple. They just work. I decided when designing D that it wouldn't be good unless string handling was as easy as in BASIC.

5 more replies

InvOfSmallC5y ago

14 more replies

masklinn5y ago

> Technically C11 has strcpy_s and strcat_s

"Theoretically" is the word you're looking for: they're part of the optional Annex K so technically you can't rely on them being available in a portable program.

And they're basically not implemented by anyone but microsoft (which created them and lobbied for their inclusion).

5 more replies

brmgb5y ago

C is semantically so poor, I find it hard to understand why people use it for new projects today. C++ is over complicated but at least you can find a good subset of it.

2 more replies

rbanffy5y ago

> strings in C are actually hard,

And that's with ASCII, where a character fits inside a byte. Don't even think about UTF-8 or any other variable-length character representation.

In fairness, the moment you realize ASCII strings are a tiny subset of what a string can be, you also understand why strings are actually very complicated.

1 more reply

mxcrossb5y ago

5 more replies

swlkr5y ago

I'm partial to https://github.com/antirez/sds these days

1 more reply

macjohnmcc5y ago

lenkite5y ago

If only C had followed the Pascal way to have the size with a string - so much human suffering could have been avoided!

1 more reply

ryandrake5y ago

I'm not a huge fan of using return values for error checking, but we have the C library that we have.

loeg5y ago

liuliu5y ago

Yeah. I just avoid str manipulations in general in C and when I have to, fuzz it ... (but still, the perf cliff is definitely new to learn in the past few days).

munchbunny5y ago

But also, "strings" and "time" are actually very complex concepts, and these functions operate on often outdated assumptions about those underlying abstractions.

kiwidrew5y ago

C99 came so very very close with VLAs. You can declare a function like:

  int main(int argc, char *argv[argc]) { ... }

But C99 requires the compiler to discard the type annotations and treat the declaration as equivalent to:

  int main(int argc, char **argv) { ... }

Imagine a world where the C string functions were declared as:

  char *strndup(s, n)
    const char *s[n];
    size_t n;
  {
    /* now we can do sizeof(s) and bounds checking! */
  }

(You'd have to use K&R style declarations to get around the fact that the pointer argument comes before the length argument, alas.)

Edit: and then C11 made VLA support optional, since the feature didn't get used much, because the feature was only half-baked to begin with... sigh.

4 more replies

retrac5y ago

7 more replies

Blikkentrekker5y ago

> But also, "strings" and "time" are actually very complex concepts, and these functions operate on often outdated assumptions about those underlying abstractions.

1 more reply

m4635y ago

I think 30-40 years ago it was perfectly appropriate to null-terminate strings. Every byte actually counted.

I remember thinking about setting the high bit to denote the end of string to save space.

Nowadays the binary for "hello world" might be as big as a whole operating system of the past.

(though honestly I can't recall the size of the OS on a boot floppy, but the original floppies were 160k)

1 more reply

jrimbault5y ago

30+ years -> 50+ years

Funny mind thing to forget to increment counters each year.

1 more reply

kazinator5y ago

The reason that the safe functions take length parameters is that they produce a new object in uninitialized memory, a pointer to which is specified by the caller.

It has nothing to do with null termination.

And that uninitialized memory is not self-describing in any way in the C language. Which is that way in machine language also.

This is a problem you have to bootstrap yourself somehow if you are to have any higher level language.

Copying two null terminated strings into an existing null-terminated string can be perfectly safe without any size parameters.

   void replace_str(char *dest_str, const char *src_left, const char *src_right);

If dest_str is a string of 17 characters, we know we have 18 bytes in which to catenate src_left and src_right.

This is not very useful though.

coliveira5y ago

2 more replies

frob5y ago

As someone who learned C as their first language, strings in every single language after that have felt like cheating.

unbalancedevh5y ago

6 more replies

macintux5y ago

Yeah, every time I decide to play with C for nostalgia's sake, I immediately get hung up on just how painful everything is, especially strings.

I still love C, but I'd do my best not to have to write anything serious with it again.

cesaref5y ago

I think the key is to understand the historical context of C, what it was competing with, and what concerns people writing C had.

Compared to the alternative (straight assembler) at the time as a systems programming language, C is a massive step up.

Also, the UNIX way was independent processes, so the APIs did not need to be thread safe, as there was no threading in the target architectures.

johnnycerberus5y ago

I wouldn't say that Go is an alternative approach. I mean, what's the difference between Go and Java AOT with Graal? But Rust is truly an alternative to C/C++.

2 more replies

da39a3ee5y ago

This is git, not github.

cperciva5y ago

IgorPartola5y ago

ggregoire5y ago

> in JavaScript you have footguns like the with statement

I've been coding in JS on a daily basis for more than 10 years and today I learned there is a `with` statement in JS.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

Edit: well, seems like it's been deprecated/forbidden since ES5 (2009), so it makes sense I've never seen it.

3 more replies

viklove5y ago

It amuses me that HN hates JS so much, that even a topic about problems with C turns into a JS-bashing thread.

4 more replies

detaro5y ago

2 more replies

rtpg5y ago

I think stuff has kinda gotten better, but while Unicode had emoji to kinda save the day, dates never had this moment and we're still suffering through major messes on a daily basis because of it.

AaronFriel5y ago

Python's dates are very unlikely to cause quadratic or exponential performance dips, segfaults, or remote code execution vulnerabilities. (And JS now has Date#toISOString, since ES5.)

C's string manipulation functions are a regular source of the worst vulnerabilities in software.

Even if they're in the same category of legacy cruft, they're not even remotely in the same magnitude of consequences.

1 more reply

ironmagma5y ago

Yeah, there is a culture of complacency in C probably owing to the enormous historical baggage of legacy code that has to be supported and the blurred line between stdlib and system call.

freedomben5y ago

It's absolutely true that decades ago the C community was complacent, but it's not true now. Source: I taught secure coding in C/C++ in the 00s.

1 more reply

dangerbird25y ago

3 more replies

dangerbird25y ago

Spivak5y ago

On BSDs and macOS you're always SOL because the syscall api isn't stable and only the C wrappers are.

beej715y ago

It's easy to survive: just don't crash. :)

And, functions aside, it's trivial to write a C program that bombs out without calling any functions at all, safe or otherwise.

matheusmoreira5y ago

Why can't we just have some nice structures instead?

  struct memory {
      size_t size;
      unsigned char *address;
  };

  enum text_encoding { TEXT_ENCODING_UTF8, /* ... */ };

  struct text {
      enum text_encoding encoding;
      struct memory bytes;
  };

guerrilla5y ago

1. https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...

1 more reply

SavantIdiot5y ago

Recall that during the rise of C, people were writing machine code on punch cards. Assembly -> Machine code has far more footbullets than C, it is a tradeoff between hand holding and tiny fast code.

Wow, this blew up.

atoav5y ago

Yeah there are footguns in every language. But this is not a boolean question about the presence of footguns, this is about how much one has to know to be able to handle a language safely.

I know C/C#/Python/Rust/Javascript.

Javascript is odd, because the typesystem has quite a few footguns in it. This is why such things like Elm or Typescript exist: to avoid these footguns.

I don't want to take away from the accomplishments of C, and I still like the language, but to claim it is equally likely in all languages to shoot yourself into the foot is not true.

eschaton5y ago

This is a grossly inaccurate description of computing at the time of the rise of C. C was competing with Pascal/Modula, BLISS, PL/I, BCPL, and so on, not assembly on punched cards.

1 more reply

cygx5y ago

Recall that during the rise of C, people were writing machine code on punch cards.

Or Fortran, Algol, Lisp, Cobol, Basic, Pascal, ...

Gibbon15y ago

1 more reply

int_19h5y ago

samatman5y ago

Your edit really isn't helping your case.

Those of us who have always known about less dangerous 'system' languages (Pascal probably being the most popular) lament the fact that so much code got written in C instead.

It wasn't inevitable. It was preventable! It just didn't happen that way for reasons which are largely historical.

badsectoracula5y ago

> when will we see the Unreal Engine written in...

Why would a huge C++ (not C, btw) codebase with roots going back to the 90s be rewritten in any other language?

And in fact how is the language Unreal Engine written in relevant to C having footguns?

maerF0x05y ago

Not that I dont believe there are any, but I'd love to hear your perspective...

Go (golang)

3 more replies

rodgerd5y ago

There's far more critical code in the world running on COBOL and s3[79]0 assembler. COBOL is vastly more important than C.

4 more replies

tester7565y ago

Just because I want to know people opinion's (+they may be more than happy to shit on C#:)

But please, nothing about using unsafe.

kazinator5y ago

gmtime is just not thread-safe that's all, since it returns a static structure; gmtime_r is not banned.

syncsynchalt5y ago

Thanks, I am now a decade out of the C game and I was wracking my brain on what the problem with gmtime would be. My best guess was dodgy is_dst portability /shrug

1 more reply

throwaway092235y ago

Many of C's problems relate to string handling. These are all legacy functions which have been replaced with safe alternatives many decades ago.

strcpy() was replaced with a safer strncpy() and in turn has been replaced with strlcpy().

The list is a ban of the less safe versions, where more modern alternatives exist.

_kst_5y ago

http://the-flat-trantor-society.blogspot.com/2012/03/no-strn...

3 more replies

cestith5y ago

Still, unless you're writing something that has to be very low-level all the way through, it's better to use a string-handling library than the stdlib tools for strings.

1 more reply

Kaze4045y ago

7 more replies

pix645y ago

strlcpy() isn't standard. You have to provide your own implementation if you want your code to be portable.

1 more reply

ChrisLomont5y ago

These still lead to lots of bugs via off by one errors on lengths or other buffer misuse.

anikom155y ago

What’s scary is programmers assuming any function as being safe. C programmers don’t trust anything, and they’re better programmers for it.

jhanschoo5y ago

This statement is not even wrong. Good programmers of any language are aware of the footguns in their language and the things their compilers assume. Bad programmers don't.

> C programmers don’t trust anything, and they’re better programmers for it.

jxy5y ago

That's an unfortunate result of backward compatibility. If it were Python, it would just become v2 and tell people to suck it up.

oleganza5y ago

0xbadcafebee5y ago

The same footgun exists in all languages; C's design just has a hair trigger.

oleganza5y ago

Communitivity5y ago

JdeBP5y ago

bregma5y ago

C is a well-stocked kitchen of a language.

Some react to this potential safety threat by banning the use of knives, stoves, sinks, and food from a kitchen.

Fortunately most attempts at safety just require having a microwave to prepare the frozen pizza or Uber Eats delivery.

ChrisRR5y ago

Strings are half the reason I never recommend someone learns C as their first language. Python is much easier, and then you can pick up C afterwards when you've got a better baseline knowledge

I've seen the same question posted way too often by beginners to C. "I've created a char*. Why am I getting <random fault> when I try to write to it?"

huhtenberg5y ago

All this stems from C strings being zero-terminated.

The variadic arguments are of the same nature - they basically allow for manual call stack parsing, again something that is a level down from the application code.

So, here we are. Zero-terminated strings are forbidden and va_args calls are nothing short of the magic.

klingon795y ago

I wonder if these headers were applied to the majority of C projects in, let’s say, all projects that were part of a Linux distribution- how much would fail to compile. My guess is: a lot.

Camillo5y ago

Many of the problems with C descend from a common root, the decision to use bare pointers (memory addresses) as the basic way to refer to strings, arrays etc.

If they had used a {pointer, size} pair instead, it would have avoided all of these string problems, most buffer overflows, even the GTA Online loading problem that was on HN recently.

cb3215y ago

For what it's worth, while what @Camillo says is both true and important, people usually do not mention the trade offs involved or why that decision was attractive at the time.

Anyway, trade offs are just something people should bear in mind when opining on the "how it should be"s and "What kind of wacky drugs were the designers of language XYZ on?!!?".

[1] https://stackoverflow.com/questions/9233306/32-bit-pointers-...

Animats5y ago

Pascal, which had sized strings, was in wide use before C. Many people, including Bill Atkinson, who wrote many of the original Macintosh applications, thought C was a step backwards.

Pascal, to save one byte, limited strings to length 255. Bad decision.

1 more reply

rightbyte5y ago

I think the sentinel character was the best choice in hindsight and at the time in that regard.

But I wish the xxx_s versions and strdup would have made it into the standard like 30 years ago.

1 more reply

HelloNurse5y ago

What you call "format a string" is actually "begin a perilous expedition into uncharted memory".

lmilcin5y ago· 16 in thread

To respond to some of the comments.

It is not that there is anything intrinsically wrong with these functions. You can technically use all of them and I have been using all of them, safely, for decades.

The issue is they are huge traps to the point that in a larger piece of software one can say "well, it's just not worth it".

You can go much, much, much further than that.

In couple embedded projects I worked some of the rules were:

* any constructs that would prevent statically calculating stack usage were banned (for example any form of recursion except when exact recursion depth is ensured statically),

* any locks were banned,

* absolutely every data structure must have size ensured, in a simple way, beyond any reasonable doubt,

etc.

whatisthiseven5y ago

Except when you have these rules in Java, the ironic counter-point is "if you are doing this much memory control yourself, you should just use C or C++ or something".

I'll keep your comment in mind next time I see that rebuttal. Thank you.

lmilcin5y ago

Having almost 20 years of experience with Java... but are not following recent garbage collector developments.

For example, allocation in Java is basically incrementing the pointer. And deallocation for most objects is basically forgetting the object exists.

No, you don't want to "limit the use of new", that's wrong approach.

What you want is to have objects that are either permanent or last very short amount of time.

The worst types of objects are ones that have kind of intermediate lifetime ie if they are allowed to mature from eden. These cost a lot to collect.

The objects that have very short lifetime are extremely cheap to collect.

2 more replies

unnouinceput5y ago

Actually I experienced worse restrictions when I was at Siemens writing embedded. Expanding on your list, here are some extras:

- ternary operator("?") was strictly forbidden. One had to use full "if () {..}else {..}" syntax with comments inside each branch even if the branch was empty

- magic numbers inside code was forbidden. All numbers had to be defined in a specific header, with explanation why is that number said value.

- no variable parameters. All functions to have fixed parameters

- use of macros as minimum as possible. Code review was wasted sometime on 50% time over use of macros that were not already "classic" from the project point of view

- operator overload strictly forbidden. Also overloading functions was forbidden too.

gpanders5y ago

All of these except for the first two seem like good rules in general. Was the ban on the ternary operator just a style/readability thing?

1 more reply

xondono5y ago

Anything enforcing MISRA has essentially (almost) no way of allocating memory at runtime.

watersb5y ago

MISRA: "Motor Industry Software Reliability Association"

https://en.m.wikipedia.org/wiki/MISRA_C

fsociety5y ago

It’s funny, I worked exclusively with MISRA at the start of my career. Eventually I started a job at a FAANG and received quizzical comments on why I implemented a memory arena.

The argument was to allocate memory freely and let it pool memory as necessary. Fair enough, it was simpler and fit the standard expectation of development.

The issue is that if you talk with the allocator team they complain of not being able to fix performance issues fast enough due to allocations firing off left and right in the middle of a request.

I never realized that my view of C programming is heavily influenced by MISRA until your comment.

I know game engine programming follows a similar, perhaps unspoken, convention.

2 more replies

closeparen5y ago

How often does the dynamic allocation rule lead to an ad-hoc allocator appearing inside the program?

Also doesn’t the OS lie? I thought the memory wasn’t really physically assigned until first use.

syncsynchalt5y ago

In my experience dynamic allocation is banned in either (a) small embedded environments or (b) high scrutiny environments (soft realtime, safety critical, etc).

lmilcin5y ago

Why would you want to implement your own allocator?

1 more reply

rightbyte5y ago

> How often does the dynamic allocation rule lead to an ad-hoc allocator appearing inside the program?

You could maybe call filescope buffers with an size counter a dynamic memory allocation. I.e. for storing RS232 or CAN messages. Since they shrink and grow.

The important thing is that you want to know that flooding one buffer wont flood another, which malloc could result in if it was used for unrelated buffers.

wongarsu5y ago

> Also doesn’t the OS lie? I thought the memory wasn’t really physically assigned until first use.

1 more reply

xxpor5y ago

It's an embedded system, it's very likely there's no OS in the first place.

1 more reply

zwieback5y ago

signa115y ago

how do ensure that?

4gotunameagain5y ago

One way I can think of is to include the banned.h after you have performed all init processes

(It would have to be in the .c files, not the headers, might not be so clean)

captainmuon5y ago· 11 in thread

Edit: as pointed out in another post, you can do git blame to see the rationale for each ban, quite interesing.

Aanok5y ago

The trouble with printf-family functions is their variadic nature. If the arguments don't match the format string, you can wreak all sorts of havoc.

A fun exercise you can do is put a "%s" in the format string, omit the string argument and see what happens to the stack.

anyfoo5y ago

That's however relatively easy to verify programmatically, and indeed any recent compiler will complain about that.

I'd say the usual trap is rather the size of the target buffer, because that requires bigger static analysis guns. (I'm ignoring things like "%n", because then you're playing with fire already.)

1 more reply

danaliv5y ago

There's that, but with sprintf/vsprintf specifically, there's no way to keep it from storing characters past the end of your buffer. For example:

    char buf[2];
    sprintf(buf, "%d", n);

This will happily write to buf[2] and beyond if n is negative or greater than 9.

dahfizz5y ago

This was my reaction as well. Banning strncpy just encourages haphazard manual copying.

smasher1645y ago

From the commit message:

If you're thinking about using it, consider instead:

  - strlcpy() if you really just need a truncated but
    NUL-terminated string (we provide a compat version, so
    it's always available)

  - xsnprintf() if you're sure that what you're copying
    should fit

  - strbuf or xstrfmt() if you need to handle
    arbitrary-length heap-allocated strings

1 more reply

asdfasgasdgasdg5y ago

I think you're meant to use snprintf instead. It would be great to see documentation on the alternatives!

azurezyq5y ago

maybe this https://github.com/git/git/blob/master/strbuf.h ?

ben_bai5y ago

strlcpy is the safe way, that is used by git.

syncsynchalt5y ago

snprintf or nul-plus-strncat do what you want, but snprintf has portability problems on overflow. Most projects I've been on rely on strlcpy (with a polyfill implementation where not available).

1 more reply

monocasa5y ago

snprintf will always terminate the string, and won't overflow the buffer.

SloopJon5y ago

It may actually be a bug that I got the warning, because the range of each input was checked, and I think the compiler is supposed to be smart enough to remember that.

drfuchs5y ago· 9 in thread

cle5y ago

From the commit messages

(https://github.com/git/git/commit/91aef030152d121f6b4bc3b933...)

(https://github.com/git/git/commit/1fbfdf556f2abc708183caca53...)

drfuchs5y ago

Yes, but every hapless user shouldn't have to go searching through a bunch of commit messages to find the suggested replacement. Bad UX.

3 more replies

tinus_hn5y ago

Strangely there is no mention of strtok which has a similar issue.

chris_wot5y ago

The commits actually do give that info. Take for instance this commit:

https://github.com/git/git/commit/c8af66ab8ad7cd78557f0f9f5e...

It actually gives examples and a lengthy explanation and reasoning behind the ban.

mamon5y ago

But why put that info in commit message instead of a comment in the file itself?

2 more replies

xorcist5y ago

Now that's what a good commit message looks like!

1 more reply

colordrops5y ago

Also, why the functions are banned.

dev_tty015y ago

It would be even nicer if it redefined the call to a safe version and then generated a warning message informing the programmer of the substitution.

pjc505y ago

You can't do that because the semantics are different in most cases.

attractivechaos5y ago· 6 in thread

I wonder how they copy strings with strcpy and strncpy both banned. strlcpy? But it is not conforming to major standards. Or just memcpy with extra code?

dgentile5y ago

Edited: Looks like they have safe alternatives: "

  - strlcpy() if you really just need a truncated but
    NUL-terminated string (we provide a compat version, so
    it's always available)

  - xsnprintf() if you're sure that what you're copying
    should fit

  - strbuf or xstrfmt() if you need to handle
    arbitrary-length heap-allocated strings
"

lights01235y ago

https://github.com/git/git/commit/e488b7aba743d23b830d239dcc... Yes:

> we provide a compat version, so it's always available

attractivechaos5y ago

[1]: https://github.com/git/git/blob/master/compat/strlcpy.c

[2]: https://linux.die.net/man/3/strlcpy

3 more replies

syncsynchalt5y ago

strncpy is a dangerous function because it doesn't nul terminate on overflow. The danger is that it's named misleadingly (str* functions otherwise always work in nul-terminated strings).

(strcpy is just banned because there's no bounds check, and they want to force use of strlcpy instead).

loeg5y ago

strncpy also has the dubious behavior of zero-filling out to N even if strlen(src) is much shorter than N.

jabl5y ago

memccpy? Most platforms have it, and it's being added to C2X.

See https://developers.redhat.com/blog/2019/08/12/efficient-stri...

zbendefy5y ago· 6 in thread

Are there some details on whats wrong with these?

bvaldivielso5y ago

The commit messages that added them explain the reasoning

ufo5y ago

alexchamberlain5y ago

All the string functions have buffer overrun vulnerabilities if not used carefully. I'm not sure about the time functions though.

edflsafoiewq5y ago

The time functions are either non-reentrant, or, for the _r versions, have the same problem with buffer overruns.

https://github.com/git/git/commit/1fbfdf556f2abc708183caca53...

https://github.com/git/git/commit/91aef030152d121f6b4bc3b933...

trilinearnz5y ago

It would be good to know what the commonly-accepted alternatives are.

csours5y ago

I'm pretty sure you could google each of these with the word 'dangerous'

For example: https://lgtm.com/rules/2154840805/

jancsika5y ago· 4 in thread

I love seeing "strncpy" right after "strcpy."

If someone wants some fun, try this:

1. Slurp up all the FOSS projects that extend back to 90s or early 2000s.

2. Filter by starting at earliest snapshot and finding occurrences of strcpy and friends who don't have the "n" in the middle.

3. For those occurrences, see which ones were "fixed" by changing them to strncpy and friends in a later commit somewhere.

I'm going to speculate that there was a period where C programmers were furiously committing a large number of errors to their codebases because the "n" stands for "safety."

gilbetron5y ago

tomjakubowski5y ago

1 more reply

commandlinefan5y ago

Ok, memcpy(dst, src, strlen(src)) it is then!

yetihehe5y ago

Yay for errors, it should be memcpy(dst, src, strlen(src)+1). Strlen doesn't count last 0. If your dst is not zeroed already you will have unterminated string.

1 more reply

bvaldivielso5y ago· 4 in thread

Ah this is a very good idea. I guess you still have to make sure that all your translation units include this header, which isn't completely foolproof.

Static analysis would probably be more robust, but way more involved.

radus5y ago

Best of both worlds: use static analysis to ensure the header is included?

koenigdavidmj5y ago

gcc has a -include option, so this can be done once in the Makefile and get the benefit everywhere (unless you’re being clever).

Athos_vk5y ago

I remember visual studio having an option to force include a file, surely something like that would exist for other toolchains

kccqzy5y ago

You don't need fancy static analysis. You can find out whether the banned functions are called just by inspecting the compiled object file. Add it to the build step and done.

ape45y ago· 3 in thread

Just replace strcpy(a,b) with strcpyn(a,b,INT_MAX)

/joke

fatnoah5y ago

I'm pretty sure I've seen similar logic in my life.

beeforpork5y ago

Been a while, eh?

It should be strncpy(a,b,(size_t)-1)!

guerrilla5y ago

SIZE_MAX does exist.

TheRealSteel5y ago· 2 in thread

I'm an idiot, I read the headline and thought these were banned from Git entirely. As in, you couldn't commit them to any repo using Git, at all. Thought that seemed a bit harsh.

Turns out you just can't use them when you contribute code to the Git project. That makes sense, and seems reasonable.

edgyquant5y ago

Critiquing poor code practices is beyond the scope of git at this time

TheRealSteel5y ago

Should be easy to implement, will have a pull request ready tomorrow.

Edit: wait, I can't use strcpy?! Screw that, then I'm not open sourcing my AGI!

EdSchouten5y ago· 2 in thread

Funnily enough, strtok() is not listed :)

raegis5y ago

This one has my vote for the weirdest library function ever.

rightbyte5y ago

The storing of state between calls is beautiful in all its wickedness.

abetusk5y ago· 2 in thread

The Git Mailing List Archive on lore.kernel.org (found in the README from the git mirror on GitHub) has more context [0] [1] [2]. From Jeff King on 2018-07-24:

  The strncpy() function is less horrible than strcpy(), but
  is still pretty easy to misuse because of its funny
  termination semantics. Namely, that if it truncates it omits
  the NUL terminator, and you must remember to add it
  yourself. Even if you use it correctly, it's sometimes hard
  for a reader to verify this without hunting through the
  code. If you're thinking about using it, consider instead:

    - strlcpy() if you really just need a truncated but
      NUL-terminated string (we provide a compat version, so
      it's always available)

    - xsnprintf() if you're sure that what you're copying
      should fit

    - strbuf or xstrfmt() if you need to handle
      arbitrary-length heap-allocated strings

I just did a search on the keywords 'banned' and 'strncpy' [2]

[0] https://lore.kernel.org/git/20180724092828.GD3288@sigill.int...

[1] https://lore.kernel.org/git/20190103044941.GA20047@sigill.in...

[2] https://lore.kernel.org/git/20190102093846.6664-1-e@80x24.or...

[3] https://lore.kernel.org/git/?q=banned+strncpy

js25y ago

Psst:

https://github.com/git/git/commits/master/banned.h

abetusk5y ago

Ha, yep, whoops

Luyt5y ago· 2 in thread

It would be great if the BANNED() macro could suggest the correct function to use.

sedatk5y ago

The right function may change based on the use case, that's why they may not have wanted to suggest an alternative outright.

tinus_hn5y ago

You could send a pull request, it doesn’t seem too complicated to implement

maxk425y ago· 2 in thread

What would be helpful is an explanation of how each function ends up being misused so people can learn from this.

petters5y ago

Git blame is helpful here. See e.g.https://github.com/git/git/commit/1b11b64b815db62f93a04242e4...

jsmith455y ago

View the git history for the file. Each commit that adds functions has a detailed explanation of what is wrong with the functions.

at_a_remove5y ago· 1 in thread

syncsynchalt5y ago

You're not wrong. But a seasoned C developer looks at this list and nods along. (I'm a little out of practice, but I have war stories for most of these).

It's likely that the authors of this list didn't think the comments would be worthwhile for the audience (git developers).

moomin5y ago· 1 in thread

They should probably add sscanf.

ed25519FUUU5y ago

First thing I looked for. It looks like it was used here:

https://github.com/git/git/blob/master/object-file.c#L1293

And currently used here (at least):

https://github.com/git/git/blob/master/refs.c#L1235

xvilka5y ago· 1 in thread

I hope, one day to see it's rewritten in a safer language.

qbasic_forever5y ago

There's a nice Go implementation of git: https://github.com/go-git/go-git

kgrimes25y ago· 1 in thread

Can a C guru provide a TL;DR of why these are bad?

syncsynchalt5y ago

    - strcpy: no bounds check
    - strcat: no bounds check
    - strncpy: does not nul-terminate on overflow
    - strncat: no major issues, probably to force usage of strlcat
    - sprintf: no bounds check
    - vsprintf: no bounds check
    - gmtime: returns static memory
    - localtime: returns static memory
    - ctime: no bounds check
    - ctime_r: no bounds check
    - asctime: returns static memory
    - asctime_r: no bounds check

The str functions all have safer alternatives. The time functions have reentrant alternatives, and/or alternatives that provide a bounds check.

sys_647385y ago· 1 in thread

getc?

syncsynchalt5y ago

`gets` would be the ultimate banned C function, I suspect nobody thought it was worth spelling out though.

StillBored5y ago

These functions are one of the many reasons why I tend to have a C with some C++ classes dialect I use in my own projects.

std::string needs some tweaks, but it can mostly be treated as a built in and it wipes out a huge set of C string issues.

shadowgovt5y ago

To its credit, it's convenient that the C pre-processor is so powerful that it facilitates baking a "C the good parts" concept directly into the compilation process.

1337_d00dZ5y ago

In compilers that implement GCC extensions (such as Clang), you can use the "poison" directive to achieve the same effect (but with a better error message):

#pragma GCC poison printf sprintf fprintf

[0] https://gcc.gnu.org/onlinedocs/gcc-3.2/cpp/Pragmas.html

DyslexicAtheist5y ago

On the other end of the spectrum we have Huawei which instead of linting their code is finding creative ways to trick auditing tools and hide such warnings from auditors:

[0] https://news.ycombinator.com/item?id=22712338

[1] https://developers.redhat.com/blog/2021/01/28/static-analysi...

[2] https://grsecurity.net/huawei_and_security_analysis

zX41ZdbW5y ago

The similar list from ClickHouse repository: https://github.com/ClickHouse/ClickHouse/blob/master/base/ha...

userbinator5y ago

At least they didn't ban memcpy()...

The only C function I think should be truly banned is gets(), because it is actually impossible to calculate what size of buffer it needs. That is not true of any of the others on this list.

[1] By short and long, I mean decades vs centuries.

synergy205y ago

snvsn5y ago

Previous discussion: https://news.ycombinator.com/item?id=20792938

shaggie765y ago

Our forbidden functions header is similar; it's got about 30 functions including most of the str* family to enforce the use of our safer versions.

robinduckett5y ago

Is there no linting software that can catch these kinds of issues? Like using strlen with sscanf like I've been hearing about lately?

whydoyoucare5y ago

I am so thankful git isn't forcefully including this header in every C language project and that we have a choice when using git! :-)

matt-attack5y ago

I used C many years ago so I’m quite out of it. What are the replacements for these? I would have thought these were all necessary.

malkia5y ago

Let's use a loophole ;) - (strcpy)(a,b)

totorovirus5y ago

Now I am getting really curious whether other companies with supposedly strong engineering knew about sscanf issues.

Animats5y ago

About 20 years too late. Those should have been moved to a "deprecated" header file decades ago.

amir734jj5y ago

Maybe instead of just writing a banned message, it should be the name of alternative function to use.

lerax5y ago

Yes, this is right. Any C decent programmer knows that functions are cursed.

jll295y ago

gets() and scanf() should be on that list due to potential buffer overflow.

anovikov5y ago

why is strncpy banned? what's wrong about it?

sys_647385y ago

scanf?

known5y ago

Just banning is not fair; Include the alternatives;

j / k navigate · click thread line to collapse