Re: C as used/implemented in practice (opens in new tab)

(article.gmane.org)

114 pointsdavidtgoldblatt10y ago111 comments

111 comments

51 comments · 10 top-level

nathanb10y ago· 28 in thread

I have difficulty accepting "let's replace C with X", where X is a memory-managed language. As a systems programmer (I write SCSI driver code in C), I can't overemphasize how important it is to be able to address memory as a flat range of bytes, regardless of how that memory was originally handed to me. I need to have uint8_t* pointers into the middle of buffers which I can then typecast into byte-aligned structs. If your memory manager would not allow this or would move this memory around, that's a non-starter.

I don't stick with C because I love it. If I'm writing something for my own purposes, I use Ruby. I've written some server code in Golang (non-production), and it's pretty nifty, even if the way it twists normal C syntax breaks my brain. I even dabble in the dark side (C++) personally and professionally from time to time. And in a previous life, I was reasonably proficient in C# (that's the CLR 2.0 timeframe; I'm completely useless at it in these crazy days of LINQ and the really nifty CLR 4 features...and there's probably even more stuff I haven't even become aware of).

But none of those languages would let me do what I need to do: zero-copy writes from the network driver through to the RAID backend. And even if they did, the pain of rewriting our entire operating system in Go or Rust or whatever would be way more than the alleviated pain of using a "nicer" language.

(We never use 'int', by the way. We use the C99 well-defined types in stdint.h. Could this value go greater than a uint32_t can represent? Make it a uint64_t. Does it need to be signed? No? Make sure it's unsigned. A lot of what he's complaining about is sloppy code. I don't care if your compiler isn't efficient when compiling sloppy code.)

daeken10y ago

Having written a lot of driver/kernel code and also dabbled in Rust, I have a hard time seeing anything that Rust can't do in this regard. All of the memory manipulation you will typically do is doable in Rust, though you will certainly lose some of the safety if you're bouncing things between types. The only real limitation you'll have is the inability (AFAIK) to do inline assembly.

There'd be no need to rewrite anything to work with Rust; the binaries emitted by the compiler should be just fine, assuming ABI compatibility. Maybe some changes to the way things are linked?

foogered10y ago

You should be able to use inline assembly in rust-nightly: https://doc.rust-lang.org/nightly/book/inline-assembly.html

1 more reply

vardump10y ago

Hmm... Because 32-bit Windows IRQ handler doesn't save FPU/SIMD state, Rust also needs to have ability to suppress any FPU/SIMD output, when generating code to run under 32-bit Windows kernel at DIRQL (= interrupt handler). Otherwise the mayhem in userland will be rather interesting, usermode thread SIMD state gets corrupted.

It'd definitely be tricky to use Rust for kernel drivers, but still so tempting!

2 more replies

monocasa10y ago

Last time I checked, Rust wasn't great for intrusive data structures, as well as non-tree graphs. Is this better now?

1 more reply

to3m10y ago

I switched over to using size_t for array indexes a few years ago, and found this a big improvement. And once I switched to size_t, most of my uses of unsigned went away - either you care about the size (and you want a uintN_t), or you don't (and you probably want size_t).

lsiebert10y ago

I think this works best if you pass a pointer to an error structure in the parameters. I've found people often use signed integer types with arrays so they can return a -1 to signal that something went wrong.

pjmlp10y ago

> But none of those languages would let me do what I need to do: zero-copy writes from the network driver through to the RAID backend.

Oberon, Modula-3 and D allow for it via their SYSTEM/Unsafe/@system modules, but the two former ones failed to get a dent into the OS market (for various reasons) and D still has some improvements to their memory model going on.

Also Ada and SPARK are usually the languages to reach for in life critical systems.

Also lets not forget before C became widespread outside UNIX, Modula-2 and Pascal dialects were saner alternatives.

pcwalton10y ago

> But none of those languages would let me do what I need to do: zero-copy writes from the network driver through to the RAID backend.

Why not?

aidenn010y ago

Are you on a 64-bit target? Then a uint32_t gets converted to a 64-bit signed value before any arithmetic operator is applied. There are some real pitfalls here, though they are often exaggerated.

Do you turn off the TBAA in your compiler? In my experience most systems programmers either turn it off, or don't know the rules.

[edit] I forgot int is typically 32-bits on 64-bit targets. The same argument would still apply for uint16_t and smaller though.

stephencanon10y ago

> uint32_t gets converted to a 64-bit signed value before any arithmetic operator is applied.

Only if int is > 32 bits on your platform, which is quite rare these days.

dbaupp10y ago

Aren't numeric promotions only applied in mixed expressions? (However, I'm not a C language-lawyer, so I could very easily be incorrect.)

2 more replies

cremno10y ago

What pitfalls do you mean (assuming 32-bit wide int and uint32_t is unsigned int as both are very common)?

_kst_10y ago

uint32_t is promoted to a signed 64-bit value only if int is 64 bits. Even on 64-bit systems, int is typically 32 bits.

Does TBAA mean type-based alias analysis?

https://en.wikipedia.org/wiki/Alias_analysis#Type-based_alia...

(EDIT: I keep forgetting HN doesn't support markdown.)

tokenrove10y ago

Most languages with garbage collectors also have mechanisms for working with memory that doesn't get touched by the collector, to support FFI.

Also, consider a language like Ada for driver work.

zamalek10y ago

> move this memory around

A lot of people seem to assume that Chris (the author) was talking about managed memory, which he never mentioned once. Managed memory is runtime safety, a type system is compile-time safety. He's complaining about the type system. As an example:

> address memory as a flat range of bytes [...] I can then typecast into byte-aligned structs

You should never have to do that. You shouldn't be able to do that. Your job should be far simpler. Look at unique_ptr: a whole class of bugs are eliminated by this ZERO cost abstraction. Possibly what Chris is advocating is being able to describe what an I/O port is to the compiler and then using that abstraction to write your SCSI driver. This intent should be compiled down to as-good machine code (if not better) than what your C compiler would have given you - in the same way that unique_ptr is compiled.

I don't think any existing language gets this right.

albinofrenchy10y ago

> You should never have to do that. You shouldn't be able to do that.

Someone, at somepoint, has to do this though. Custom memory allocators are more or less predicated on having a byte buffer you chop up and use like this.

I think the best we get -- especially in driver code -- is well thought out design that have low cost abstractions between the device details and the application logic. But that seems like a library detail more than a compiler or language one.

sharpneli10y ago

> I don't think any existing language gets this right.

Which is precisely the reason C is still used.

Until we have a language that produces at least as good results as C and is safer we're not going to see any change in this area.

1 more reply

sklogic10y ago

> I need to have uint8_t* pointers into the middle of buffers which I can then typecast into byte-aligned structs.

Quite a lot of CPUs would just trap here. Assuming that unaligned access is allowed is a sin.

bluetomcat10y ago

> Quite a lot of CPUs would just trap here

Or even worse -- for example, ARM CPUs usually round-down the misaligned address to the closest boundary when alignment checking is disabled. This means that attempting to access a 4-byte int at location 11 will silently let the CPU access it at location 8. This can manifest in some very nasty bugs.

1 more reply

Nursie10y ago

Really?

Because this is very common practice in device or network code, in my experience.

1 more reply

spoiler10y ago

I agree, and I'd like to add that its not just this particular author, but most people who criticise C about it's "insecurities" use sloppy code when they criticise C, which always bothers me. I'm far from being a C fan (I'm also a Ruby fanboy), but programming languages aren't safe, only code can be safe, and that depends entirely on the developer.

Yes, it's "easier" to introduce some bugs in C than Ruby (or Go, or whatever), but that's because whoever wrote that code with the bug didn't know C well enough. Is that C's fault? Same can be said about any language, really.

If you don't know that String#match returns nil on unsuccessful matches and try to call MatchData#[], you'll get a NPE (something along the lines of "undefined method `[]' for NilClass"). This is very similar to dereferencing a NULL pointer in C[1].

[1]: I know dereferencing a NULL pointer in C is undefined behaviour, but your program will crash—if you're lucky enough—when you try to work with NULL pointers when you don't expect them.

dbaupp10y ago

This is nonsense. C has a very weak type system and very weak runtime guarantees, making it much easier to introduce problems with no indication that something's up. Other languages with strong type systems and/or stronger runtime checks eliminate large classes of bugs that are very easy to trigger in C.

So, yes, it is C's "fault" that it doesn't protect against classes of bugs that many other languages do. Sure, those languages have some of the same bugs that C does, but they're missing most of the very worst ones and that's really powerful. For example, a garbage collector protects against accessing dangling pointers: it's just not something the programmer has to worry about at all.

Rejecting cricitisms of C's safety inadequacies with "just code better"/"just learn the language better" doesn't work in practice: there have been too many high-profile vulnerabilities in C software, many of which would've been much harder to trigger in other languages.

2 more replies

freyr10y ago

Since the author of the post is also an author of LLVM, clang, and Swift, and the director of developer tools at Apple, he certainly understands the difference between sloppy and non-sloppy code, but he also knows that sloppy code is a reality and undefined behavior is dangerous.

So we can make everyone a better programmer, or we can make better languages, or we can throw in the towel and say things are good enough. I think he's suggesting the correct path.

pcwalton10y ago

The undefined behavior of C and C++ results in remote code execution to a degree completely unmatched by other languages.

Examples of dangerous UB in C always use deliberately sloppy code for pedagogical reasons. For real-world examples of problems, look at the CVE database.

2 more replies

geofft10y ago

> Yes, it's "easier" to introduce some bugs in C than Ruby (or Go, or whatever), but that's because whoever wrote that code with the bug didn't know C well enough. Is that C's fault? Same can be said about any language, really.

I thought the promise of computers is that we didn't need to have smart people working on repetitive, boring, error-prone jobs.

And yes, there's a line between languages like C, Ruby, Python, Objective-C, etc. on one hand that don't actively try to make bugs hard, and Ada, Rust, Haskell, Ur, etc. on the other. That line is not particularly lined up with something like interpreted vs. compiled or old vs. new, and if you look for the line there, you won't see it.

vardump10y ago

I don't know anyone who can write proper, secure, bug free C code in multithreaded environment. Some people, such as DJB [1] do get pretty close, though.

Much less those who can do same with C++.

But when you have larger teams, it gets even harder. People just think so differently and misunderstand intentions without realizing.

I did think I could do that in my twenties. 15+ years later I have a lot more respect for C.

[1]: http://cr.yp.to/qmail/guarantee.html

MaulingMonkey10y ago

> most people who criticise C about it's "insecurities" use sloppy code when they criticise C, which always bothers me

If you're really really lucky, your coworkers will only write sloppy code by accident. But unless you're only working on toy projects, statistics will catch up to you and sloppy code will happen. To err is human.

By NASA standards, I suspect most of your code has been written "sloppy". As has most of mine.

> but programming languages aren't safe, only code can be safe, and that depends entirely on the developer.

Languages can be safe in the sense that they can force code to be safe in specific ways, or at least warn you better with unsafe opt-ins or better static analysis.

We agree that the developer is to blame for the thousands of overflow CVEs out there.

One developer recognizes they're not an infallible robot, nor are their coworkers, nor is the new intern they're about to hire, and uses the tools at their disposal - static analysis, "safe" languages, etc. - to catch and fix some large percentage of certain mistakes they, and those they work with, make.

Another developer scoffs at the first for "blaming their tools" and tries to avoid mistakes with sheer willpower. By not setting up static analysis, maybe they save enough time to do an additional 10-20 code reviews over the course of the project.

All else being equal, who will end up with safer code?

> Yes, it's "easier" to introduce some bugs in C than Ruby (or Go, or whatever), but that's because whoever wrote that code with the bug didn't know C well enough.

If this is true, nobody knows C well enough. Find me a programmer who's written a sufficiently large C project without a single bug, and I will worship them as a living god.

> Is that C's fault? Same can be said about any language, really.

I don't care about fault, per se. But sure, let's blame C. And every other language. Let's not blind ourselves against their faults, and the possible ways we might improve them, and the possible ways we might adapt ourselves to them.

Let's not saddle ourselves with stone axes for the rest of our lives.

> If you don't know that String#match returns nil on unsuccessful matches and try to call MatchData#[], you'll get a NPE (something along the lines of "undefined method `[]' for NilClass"). This is very similar to dereferencing a NULL pointer in C[1].

Hence the point of talks such as "Null References: The Billion Dollar Mistake", and why some languages are designed to avoid letting you access potentially null/nil/nothing variables without checking that they aren't first.

deathanatos10y ago

> I have difficulty accepting "let's replace C with X", where X is a memory-managed language.

What about C++ (which adds RAII, which I believe is indispensable for writing correct code, especially over C) or Rust, which adds much better memory correctness? I'm in agreement where the article says,

> C, and derivatives like C++, is a very dangerous language the write safety/correctness critical software in, and my personal opinion is that it is almost impossible to write security critical software in it

Though I believe it can be done in C++, with some discipline (but much less than C would require).

> As a systems programmer (I write SCSI driver code in C),

I think SCSI driver code counts as a niche application

> I can't overemphasize how important it is to be able to address memory as a flat range of bytes, regardless of how that memory was originally handed to me. I need to have uint8_t* pointers into the middle of buffers which I can then typecast into byte-aligned structs.

My understanding is that C doesn't generally allow this; that's what the strict aliasing rule is, and what's "wrong"[1] with several of the examples in the article. IIRC, you can get a [unsigned] char * into a struct (but why?[2]), but attempting to cast a char * to a struct foo * is forbidden.

(Of course, with amends to the thread's original purpose, which is asking what the common layman understands / uses / depends on. Type aliasing is not well understood in my opinion. I'm not entirely confident I've got it right in this post.)

> If your memory manager would not allow this or would move this memory around, that's a non-starter.

(same comments about Rust/C++)

> We never use 'int', by the way. We use the C99 well-defined types in stdint.h. Could this value go greater than a uint32_t can represent? Make it a uint64_t. Does it need to be signed? No? Make sure it's unsigned. A lot of what he's complaining about is sloppy code.

In my experience, this is a rare thing; especially while interviewing, I find the majority of candidates — claiming to be most comfortable in C (we allow language of choice, in the hopes that you choose your strongest!) — don't know what `size_t` is.

[1]: The array copy code is correct, but the author is lamenting optimizations that cannot be taken unless we assume the pointers don't alias; the int-to-float code is UB (hence why he writes "miscompile" in quotes; it's UB, so by definition there's no wrong output (though an error might be nice); this is also why "obvious" is in quotes: humans know what the programmer meant, but what the programmer wrote is UB; I think this is telling about C: human expectation and the language don't align, from a language-design standpoint, this is not good).

[2]: most of the time I see people reaching for a char-pointer-into-a-struct, or cast-char-pointer-to-struct, they're short circuiting actually decoding some I/O byte-stream into an in memory data structure. This is not portable, unless — maybe — if you do "packed" structures (which is still not portable, I believe), but then you're sacrificing performance by potentially having unaligned members in the struct (which are harder for the processor to deal with, and might require multiple (e.g., MIPS) or unaligned (e.g., x86, amd64)) loads/stores.

byuu10y ago· 4 in thread

I understand that in some cases, these heroic compiler optimizations can offer significant performance increases. We should keep C around as it is for when said performance is critical.

But surely, we can design a language that has no undefined behavior, without substantial deviations from C's syntax, and without massive performance penalties. This language would be great for things that prize security over performance.

And the trick is, we don't need to rewrite all software in existence in a new language to get here! C can be this language, all we need is a special compilation flag that replaces undefined behavior with defined behavior. Functions called inside a function's arguments? Say they evaluate left-to-right. Shift right on signed types? Say it's arithmetic. Size of a byte? Say it's 8-bits. memset(0x00) on something going out of scope? If the developer said to do it, do it anyway. Underlying CPU doesn't support this? Emulate it. If it can't be emulated, then don't use code that requires the safe flag on said architecture. Yeah, screw the PDP-11. And yeah, it'll be slower in some cases. Yes, even twice as slow in some cases. But still far better than moving to a bytecode or VM language.

And when we have guaranteed behavior of C, we can write new DSLs that transcode to C, without carrying along all of C's undefined behavior with it.

You want to talk about writing in higher-level languages like Python and then having C for the underlying performance critical portions? Why not defined-behavior C for the security-critical and cold portions of code, and undefined-behavior C for the critical portions?

Maybe Google wouldn't accept the speed penalty; but I'd happily drop my personal VPS from ~8000 maximum simultaneous users to ~5000 if it greatly decreased the odds of being vulnerable to the next Heartbleed. But I'm not willing to completely abandon all C code, and drop down to ~200 simultaneous users, to write it in Ruby.

pcwalton10y ago

> But surely, we can design a language that has no undefined behavior, without substantial deviations from C's syntax, and without massive performance penalties.

…Including undefined behavior around memory allocation, in particular use-after-free?

What to do about that is the big question, in my mind. Other forms of UB can mostly be patched up straightforwardly with a clean design (though there are some tough questions around bounds checks). But when it comes to UAF, there are basically three ways you can go about this and still remain a runtimeless systems language:

1. Compromise on "no UB" for use-after-free. UAF remains undefined behavior. Some variants of Ada with dynamic memory allocation have this, and I believe many Pascals did this. It's a popular approach in many new systems languages, like Jonathan Blow's Jai.

2. Disallow dynamic allocation. This is the approach taken by SPARK and other hardened variants of Ada.

3. Allow dynamic allocation, but statically check it with a region system. This is Rust's approach. Eliminating memory safety problems in this way while avoiding a GC is pretty much unique to that language, though it's obviously influenced by many other systems that came before it (C++, Cyclone).

All of the options have serious downsides. Option (1) opens you up to what has become, in 2015, a very common RCE vector. Option (2) is very limiting and pretty much restricts your language to embedded development. Option (3) has large complexity and expressiveness costs (though once you've paid the cost you can get data race freedom without any extra work, which is nice). Altogether it's a really difficult problem with tough tradeoffs all around.

byuu10y ago

> Including undefined behavior around memory allocation, in particular use-after-free?

There are obviously going to be limits to what can be done. If you access beyond memory, you get "bad data" if the address is mapped by the OS, or a crash if it's not. That's a clear bug, and we can't make C a language that is incapable of producing programs with bugs. I don't really think of this as "undefined" ... we define very clearly that one of two things happens, based on the OS' memory layout. That's very different from GCC's understanding, where undefined == "if I want to have the program upload a cat picture to Reddit instead of shift a signed integer right, then that's what I'll do." (facetious, but you get the idea. Many of GCC's 'optimizations' cause outright security vulnerabilities, and defy all logic, like deleting chunks of code entirely.)

We want the most logical thing to happen when a user does something, not a completely unexpected thing just because it happens to make some compiler benchmark test look a little better.

> Other forms of UB can mostly be patched up straightforwardly with a clean design

I'm betting there aren't any C programmers out there that know 100% of the behaviors that are undefined. I've been programming for 18 years, and I got bit the other day because I had "print(sqlRow.integer(), ", ", sqlRow.integer());" ... where the .integer() call incremented the internal read position. MinGW decided to evaluate the second call first, and then the first one, so my output ended up backward. You may think that one's obvious, just like I might think that a shift by more bits than the integer type holds being undefined is obvious, but there are people that would be surprised by both.

Stating that function arguments evaluate left-to-right, just like "operator," does in expressions, would be an infinitesimal speed hit on strange systems, and no speed hit at all on modern systems that can just as easily use an indirect move to set up the stack frame.

And if you have a processor that can't do arithmetic shift right, which would be extremely rare, then generate that processor's equivalent of "((x & m) ^ b) - b" after the shift.

1 more reply

bandrami10y ago

There's a fourth option: disallowing freeing. A resource, once allocated, cannot be repurposed. It sounds crazy for those of us who grew up with the resource limitations of the 1980s, but as time passes it becomes an increasingly interesting idea.

pjmlp10y ago

Regarding Pascal, Borland dialects allowed for regions, but of course they are scope based.

mrpippy10y ago· 4 in thread

For the for loop example, is there some reason why clang doesn't output a warning like "Does 'i' really need to be signed? If so, explicitly make it a 'signed int'. Otherwise, change it to be unsigned"

porges10y ago

The point here is not particularly about signedness, it's that UB allows better optimizations to be performed.

If overflow is defined to wrap around then it's potentially an infinite loop (take N == MAXVALUE). With overflow defined as UB you can say the loop executes exactly N times (because you're not allowed to write code that overflows).

So UB is both bad and a source of power :)

cowsandmilk10y ago

> The point here is not particularly about signedness

But in the case of C, that is what it is about since unsigned integers have defined behavior, so you can only have UB and the optimizations when you use a signed integer.

1 more reply

GFK_of_xmaspast10y ago

If N is unsigned, clang++ (and I think g++) will warn about signed/unsigned comparison, but sometimes you do want signed loop indices.

tacos10y ago

Microsoft's compiler has been ALL OVER THIS for 20 years.

Nothing says "buggy port of Linux app to Windows" faster than "#pragma warning (disable: 4018)".

Clang is catching up. GCC doesn't seem to care as much. Given all the bitching about undefined behavior you'd think they'd up the warnings.

Compilers are programs too, y'know. We can define (and gasp, re-define) behavior.

1 more reply

carlosrg10y ago· 2 in thread

Until I see really big and open source projects like WebKit or Clang itself moving to Swift or whatever, anything I read about moving to "better systems languages" is like reading a letter to Santa Claus. I doubt C++ is going anywhere, especially when C++ itself is not standing still and evolving (C++11, 14, 17...) while maintaining backwards compatibility.

pjmlp10y ago

What about having early versions Mac OS written in Object Pascal, only to rewrite it in C for pleasure of the UNIX hordes?

GFK_of_xmaspast10y ago

I wasn't a mac user until later, are you saying that was done for reasons of ideology?

1 more reply

mcguire10y ago· 1 in thread

"In the first example above, it is that 'int' is the default type people generally reach for, not 'long', and that array indexing is expressed with integers instead of iterators. This isn’t something that we’re going to 'fix' in the C language, the C community, or the body of existing C code."

The majority of that message is pretty well said, but this particular part leaves me cold. The problem isn't that 'int' is the default type, not 'long', nor is it that array indexing isn't done with iterators. (Ever merged two arrays? It's pretty clear using int indexes or pointers, but iterators can get verbose. C++ does a very good job, though, by making iterators look like pointers.) The problem is that, in C, the primitive types don't specifically describe their sizes. If you want a 32-bit variable, you should be able to ask for an unsigned or signed 32-bit variable. If you want whatever is best on this machine, you should be able to ask for whatever is word-sized. Unfortunately, C went with char <= short <= int <= long (, longlong, etc.); in an ideal world, 'int' would be the machine's word size, but when all the world's a VAX, 'int' means 32-bits.

That is one of the major victories with Rust: most primitive types are sized, with an additional word-sized type.

Gibbon110y ago

Then again with C99 you do have stdint.h which gives you defined width types as well as minimum width types. And others.

ryanmarsh10y ago· 1 in thread

Have we lost sight of the fact that when we talk about a programming language we're really talking about how to put bits on CPU registers?

kd0amg10y ago

We aren't. Most programming languages do not expose CPU registers to the programmer (because they are not semantically important), and most programmers do not think in terms of moving bits in and out of registers (and typically don't have much to gain from doing so). CPU registers are just an accident of implementation. Programming as a general activity is not inherently tied to a register-based CPU. In fact, programmers are often happy to have compilers/machines remove some put-bits-into-register activities.

JustSomeNobody10y ago· 1 in thread

What would be considered "security critical"? SSH? IPTables? Linux kernel?

AgentME10y ago

Personally, as someone that doesn't want to be hacked, I would think anything I run that connects to the internet and isn't already very well sandboxed.

jeffreyrogers10y ago

For those who don't know Chris Lattner[1], who wrote this post, is the primary author of LLVM and more recently of Swift, so he knows a bit about what he's talking about :)

[1]: https://en.wikipedia.org/wiki/Chris_Lattner

pjmlp10y ago

"My hope is that the industry will eventually move to better systems programming languages, but that will take a very very long time..."

-- Chris Lattner

Yes, a very long time. Modula-2 was born in 1978, but we can go back to Algol and Lisp even.

nikanj10y ago

Once again, http://research.microsoft.com/en-us/people/mickens/thenightw...

j / k navigate · click thread line to collapse

111 comments

51 comments · 10 top-level

nathanb10y ago· 28 in thread

daeken10y ago

There'd be no need to rewrite anything to work with Rust; the binaries emitted by the compiler should be just fine, assuming ABI compatibility. Maybe some changes to the way things are linked?

foogered10y ago

You should be able to use inline assembly in rust-nightly: https://doc.rust-lang.org/nightly/book/inline-assembly.html

1 more reply

vardump10y ago

It'd definitely be tricky to use Rust for kernel drivers, but still so tempting!

2 more replies

monocasa10y ago

Last time I checked, Rust wasn't great for intrusive data structures, as well as non-tree graphs. Is this better now?

1 more reply

to3m10y ago

lsiebert10y ago

pjmlp10y ago

> But none of those languages would let me do what I need to do: zero-copy writes from the network driver through to the RAID backend.

Also Ada and SPARK are usually the languages to reach for in life critical systems.

Also lets not forget before C became widespread outside UNIX, Modula-2 and Pascal dialects were saner alternatives.

pcwalton10y ago

> But none of those languages would let me do what I need to do: zero-copy writes from the network driver through to the RAID backend.

Why not?

aidenn010y ago

Are you on a 64-bit target? Then a uint32_t gets converted to a 64-bit signed value before any arithmetic operator is applied. There are some real pitfalls here, though they are often exaggerated.

Do you turn off the TBAA in your compiler? In my experience most systems programmers either turn it off, or don't know the rules.

[edit] I forgot int is typically 32-bits on 64-bit targets. The same argument would still apply for uint16_t and smaller though.

stephencanon10y ago

> uint32_t gets converted to a 64-bit signed value before any arithmetic operator is applied.

Only if int is > 32 bits on your platform, which is quite rare these days.

dbaupp10y ago

Aren't numeric promotions only applied in mixed expressions? (However, I'm not a C language-lawyer, so I could very easily be incorrect.)

2 more replies

cremno10y ago

What pitfalls do you mean (assuming 32-bit wide int and uint32_t is unsigned int as both are very common)?

_kst_10y ago

uint32_t is promoted to a signed 64-bit value only if int is 64 bits. Even on 64-bit systems, int is typically 32 bits.

Does TBAA mean type-based alias analysis?

https://en.wikipedia.org/wiki/Alias_analysis#Type-based_alia...

(EDIT: I keep forgetting HN doesn't support markdown.)

tokenrove10y ago

Most languages with garbage collectors also have mechanisms for working with memory that doesn't get touched by the collector, to support FFI.

Also, consider a language like Ada for driver work.

zamalek10y ago

> move this memory around

> address memory as a flat range of bytes [...] I can then typecast into byte-aligned structs

I don't think any existing language gets this right.

albinofrenchy10y ago

> You should never have to do that. You shouldn't be able to do that.

Someone, at somepoint, has to do this though. Custom memory allocators are more or less predicated on having a byte buffer you chop up and use like this.

sharpneli10y ago

> I don't think any existing language gets this right.

Which is precisely the reason C is still used.

Until we have a language that produces at least as good results as C and is safer we're not going to see any change in this area.

1 more reply

sklogic10y ago

> I need to have uint8_t* pointers into the middle of buffers which I can then typecast into byte-aligned structs.

Quite a lot of CPUs would just trap here. Assuming that unaligned access is allowed is a sin.

bluetomcat10y ago

> Quite a lot of CPUs would just trap here

1 more reply

Nursie10y ago

Really?

Because this is very common practice in device or network code, in my experience.

1 more reply

spoiler10y ago

[1]: I know dereferencing a NULL pointer in C is undefined behaviour, but your program will crash—if you're lucky enough—when you try to work with NULL pointers when you don't expect them.

dbaupp10y ago

2 more replies

freyr10y ago

So we can make everyone a better programmer, or we can make better languages, or we can throw in the towel and say things are good enough. I think he's suggesting the correct path.

pcwalton10y ago

The undefined behavior of C and C++ results in remote code execution to a degree completely unmatched by other languages.

Examples of dangerous UB in C always use deliberately sloppy code for pedagogical reasons. For real-world examples of problems, look at the CVE database.

2 more replies

geofft10y ago

I thought the promise of computers is that we didn't need to have smart people working on repetitive, boring, error-prone jobs.

vardump10y ago

I don't know anyone who can write proper, secure, bug free C code in multithreaded environment. Some people, such as DJB [1] do get pretty close, though.

Much less those who can do same with C++.

But when you have larger teams, it gets even harder. People just think so differently and misunderstand intentions without realizing.

I did think I could do that in my twenties. 15+ years later I have a lot more respect for C.

[1]: http://cr.yp.to/qmail/guarantee.html

MaulingMonkey10y ago

> most people who criticise C about it's "insecurities" use sloppy code when they criticise C, which always bothers me

By NASA standards, I suspect most of your code has been written "sloppy". As has most of mine.

> but programming languages aren't safe, only code can be safe, and that depends entirely on the developer.

Languages can be safe in the sense that they can force code to be safe in specific ways, or at least warn you better with unsafe opt-ins or better static analysis.

We agree that the developer is to blame for the thousands of overflow CVEs out there.

All else being equal, who will end up with safer code?

> Yes, it's "easier" to introduce some bugs in C than Ruby (or Go, or whatever), but that's because whoever wrote that code with the bug didn't know C well enough.

If this is true, nobody knows C well enough. Find me a programmer who's written a sufficiently large C project without a single bug, and I will worship them as a living god.

> Is that C's fault? Same can be said about any language, really.

Let's not saddle ourselves with stone axes for the rest of our lives.

deathanatos10y ago

> I have difficulty accepting "let's replace C with X", where X is a memory-managed language.

Though I believe it can be done in C++, with some discipline (but much less than C would require).

> As a systems programmer (I write SCSI driver code in C),

I think SCSI driver code counts as a niche application

> If your memory manager would not allow this or would move this memory around, that's a non-starter.

(same comments about Rust/C++)

byuu10y ago· 4 in thread

I understand that in some cases, these heroic compiler optimizations can offer significant performance increases. We should keep C around as it is for when said performance is critical.

And when we have guaranteed behavior of C, we can write new DSLs that transcode to C, without carrying along all of C's undefined behavior with it.

pcwalton10y ago

> But surely, we can design a language that has no undefined behavior, without substantial deviations from C's syntax, and without massive performance penalties.

…Including undefined behavior around memory allocation, in particular use-after-free?

2. Disallow dynamic allocation. This is the approach taken by SPARK and other hardened variants of Ada.

byuu10y ago

> Including undefined behavior around memory allocation, in particular use-after-free?

We want the most logical thing to happen when a user does something, not a completely unexpected thing just because it happens to make some compiler benchmark test look a little better.

> Other forms of UB can mostly be patched up straightforwardly with a clean design

And if you have a processor that can't do arithmetic shift right, which would be extremely rare, then generate that processor's equivalent of "((x & m) ^ b) - b" after the shift.

1 more reply

bandrami10y ago

pjmlp10y ago

Regarding Pascal, Borland dialects allowed for regions, but of course they are scope based.

mrpippy10y ago· 4 in thread

porges10y ago

The point here is not particularly about signedness, it's that UB allows better optimizations to be performed.

So UB is both bad and a source of power :)

cowsandmilk10y ago

> The point here is not particularly about signedness

But in the case of C, that is what it is about since unsigned integers have defined behavior, so you can only have UB and the optimizations when you use a signed integer.

1 more reply

GFK_of_xmaspast10y ago

If N is unsigned, clang++ (and I think g++) will warn about signed/unsigned comparison, but sometimes you do want signed loop indices.

tacos10y ago

Microsoft's compiler has been ALL OVER THIS for 20 years.

Nothing says "buggy port of Linux app to Windows" faster than "#pragma warning (disable: 4018)".

Clang is catching up. GCC doesn't seem to care as much. Given all the bitching about undefined behavior you'd think they'd up the warnings.

Compilers are programs too, y'know. We can define (and gasp, re-define) behavior.

1 more reply

carlosrg10y ago· 2 in thread

pjmlp10y ago

What about having early versions Mac OS written in Object Pascal, only to rewrite it in C for pleasure of the UNIX hordes?

GFK_of_xmaspast10y ago

I wasn't a mac user until later, are you saying that was done for reasons of ideology?

1 more reply

mcguire10y ago· 1 in thread

That is one of the major victories with Rust: most primitive types are sized, with an additional word-sized type.

Gibbon110y ago

Then again with C99 you do have stdint.h which gives you defined width types as well as minimum width types. And others.

ryanmarsh10y ago· 1 in thread

Have we lost sight of the fact that when we talk about a programming language we're really talking about how to put bits on CPU registers?

kd0amg10y ago

JustSomeNobody10y ago· 1 in thread

What would be considered "security critical"? SSH? IPTables? Linux kernel?

AgentME10y ago

Personally, as someone that doesn't want to be hacked, I would think anything I run that connects to the internet and isn't already very well sandboxed.

jeffreyrogers10y ago

For those who don't know Chris Lattner[1], who wrote this post, is the primary author of LLVM and more recently of Swift, so he knows a bit about what he's talking about :)

[1]: https://en.wikipedia.org/wiki/Chris_Lattner

pjmlp10y ago

"My hope is that the industry will eventually move to better systems programming languages, but that will take a very very long time..."

-- Chris Lattner

Yes, a very long time. Modula-2 was born in 1978, but we can go back to Algol and Lisp even.

nikanj10y ago

Once again, http://research.microsoft.com/en-us/people/mickens/thenightw...

j / k navigate · click thread line to collapse