Well, it's been fuzzed, so there's a pretty decent chance that the LLVM code has zero exploitable flaws. Also, the C++ demangler isn't as security-sensitive as many other more important projects.
That said, as a general rule raw C-style string parsing, which that C++ code is an example of, is asking for trouble. It's painful to write, painful to maintain, painful to debug, and, when it fails, fails in the worst possible way. In my opinion there's little benefit to using C++ for these workloads.
AFAIK Valgrind can catch many memory access related errors. And static analyzers can point out where such errors could happen.
From a different perspective: Why don't people do pascal stile strings with C or C++ (and such) ? I think even i could make a library that does all the string.h things but with length-prefixed strings. People have been complaining for years now about C stile strings (that have been widely used for decades now). But nobody ever did anything about it ?
PS I, myself, don't find it "painful" to use C stile strings. A few basic precautions (like using strncmp(bla,blaa,buffer_size)) and your fine. Off-by-one is more of a problem, at least for me at night (valgrind and llvm static analyzer point even those out).
True, but Valgrind doesn't exist in every OS that one might need to work with, specially embedded ones, and according to Herb's talk at CppCon 2015, only 1% of the audience was using any form of static analyisis tooling.
The world of C and C++ programming, in typical 9-5 enterprise jobs is quite different from HN ideals about code quality.
> I think even i could make a library that does all the string.h things but with length-prefixed strings. People have been complaining for years now about C stile strings (that have been widely used for decades now). But nobody ever did anything about it ?
Because it takes a big effort to make such library part of ANSI C.
> PS I, myself, don't find it "painful" to use C stile strings. A few basic precautions (like using strncmp(bla,blaa,buffer_size)) and your fine.
You already did a possible security exploit on your example, as you need to guarantee 100% of the time that bla and blaa sizes are always less or equal than buffer_size.
When working with a team it suffices that another person changes your code without accounting for this invariant.
bla and blaa buffer sizes, yes. I mostly string file names/paths, that have NAME_MAX and PATH_MAX. So my excuse here is that all my buffers have the same size (not the best excuse, i know). PS I always sanitize on input if the input can be anything (network/ipc sockets, for example).
>True, but Valgrind doesn't exist in every OS that one might need to work with, specially embedded ones, and according to Herb's talk at CppCon 2015, only 1% of the audience was using any form of static analyisis tooling.
Quick google shows some tools for windows. I don't know how good they are. Clangs static analyzer seems to work on windows, for static analysis. But if it compiles on linux/bsd/osx then people should check before shipping the code.
This just reminded me of the latest Linus rant[0].
Then there is fuzzing and formal verification, that are too much for "normal" programs (these are not C specific).
>The world of C and C++ programming, in typical 9-5 enterprise jobs is quite different from HN ideals about code quality.
As is in the world of hobby C and C++ programming. And python, and ruby, and haskell, and... It just shows more in C. But C is still the king when it comes to portable and efficient programs, and that won't change soon (although there is less need for such programs, as "embedded" today equals "has only 512MB RAM").
[0] http://lkml.iu.edu/hypermail/linux/kernel/1702.2/05171.html
People have been complaining for years now about C stile strings (that have been widely used for decades now). But nobody ever did anything about it ?
There have been many efforts (http://www.and.org/vstr/comparison), but ultimately they all failed to find widespread adoption because they're solving the wrong problem.For any non-trivial data munging operation experienced C programmers quickly ditch C strings in favor of vectors. The simplest approach employs a simple pointer and length (or boundary) tuple; i.e. a vector. You can get fancy by wrapping them in a struct, but even that isn't necessary and, especially wrt API design, often needlessly forces interface users to create temporary "slice" objects with an annoying type peculiar to the interface. Newer languages offer nicer interfaces for slices, but simple pointers are hard to beat in terms of simplicity and usability. (The only problem with simple pointer derivation and manipulation a la C is that they're difficult for a compiler to both verify the correct use of _and_ to aggressively optimize. You must choose one or the other. Requiring the user to use specialized, compiler sanctioned primitive aggregate types in languages like Rust is a way to meet the compiler half-way.)
Also, for complex parsing tasks experienced C programmers will often code a straight-up state machine, or leverage a parser generator. In both cases C-style NUL terminated strings aren't even visible in the rearview mirror.
IME, I've found that parsing of data is one of the areas where C excels. And by parsing I don't mean attacking data with regular expressions. Likewise, for creating complex data structures like graphs C excels, especially when you want to employ intrusive data structure patterns for efficiency and clarity. Pointers are wonderful abstractions that way.
There are a lot of difficulties with C, particularly regarding memory management. But string processing is not one of them, except for programmers for whom at that moment parsing is synonymous with crude hacks using regular expressions or the limited interfaces for C-style strings. It's self-inflicted. The solution doesn't require a complex framework. Addressing the issue merely requires reframing the task. Fortunately, when reframing is too burdensome, for quick and dirty string hacking there are plenty of alternative languages.