undefined | Better HN

0 pointsj-pb3y ago0 comments

The major difference between Rust and [Zig, C] is that the latter are essentially "cocky macro assemblers". They derive their semantics from the semantics of the underlying assembly. Take https://github.com/ziglang/zig/issues/12251 as an example. Several people from the Rust and Haskell community slacked off on twitter about how awful of a language design this is. However, when you think about what the language actually compiles to it makes perfect sense, and is pretty straightforward.

Rust has the goal of putting as much smartness between what you type and what the compiler produces, which is a perfectly fine goal, but without a specification for the input semantics it's a pretty wobbly thing, especially for a systems programming language. A lot of bit packing code is simply not writable in rust today without immediately invoking UB, that works for now but might break with every bugfix release.

0 comments

19 comments · 5 top-level

adwn3y ago· 6 in thread

> The major difference between Rust and [Zig, C] is that the latter are essentially "cocky macro assemblers". They derive their semantics from the semantics of the underlying assembly.

I don't know about Zig, but as for C, this take is not just wrong, but dangerously wrong. The behavior of C code isn't defined by the underlying assembly, but by an abstract machine model that may or may not match your real machine (narrator: It doesn't). As a result, the compiler can and will ignore the intention of your code when it can prove that your code would invoke UB. Good example: signed integer arithmetic and overflow tests. It is very difficult to write overflow tests for signed integer operations without accidentally invoking UB, and modern compilers will simply remove your overflow tests instead of translating them 1:1 to assembly instructions.

j-pbOP3y ago

Assembly in the macro-assembler sense. Doesn't matter that it's obscure and not a direct mapping to any specific ISA, there is still relatively straightforward mapping between syntactic constructs and something von neumann-ish.

Rust simply lacks that mapping, yet.

ralfj3y ago

Rust has that mapping in the same sense that C does: there exists a fairly simple way to implement every language construct (except maybe for dynamic method invocations) in assembly.

However, Rust and C alike have another whole dimension to their semantics, that of Undefined Behavior, which is not reflected in the assembly, and which needs to be taken into account for unsafe code authors (in Rust) / by all programmers (in C). See for example https://www.ralfj.de/blog/2019/07/14/uninit.html for what goes wrong when you think of C as just a macro assembler.

2 more replies

formerly_proven3y ago

If we understand "straightforward" in the sense that small changes in the input lead to predictable and small changes in the output, then I can't see how this is reasonably true for pretty much any optimizing compiler.

Just today we had this nice example on the front page where initializing a variable differently completely changed the codegen for large parts of the program: https://jpieper.com/2022/08/05/debugging-bare-metal-stm32-fr...

2 more replies

Diggsey3y ago

This is still wrong though, since UB doesn't exist at an assembly level, and yet C has it. Therefore there is no clear mapping from C to assembly (even considering hypothetical ISAs).

1 more reply

lmm3y ago

> Doesn't matter that it's obscure and not a direct mapping to any specific ISA, there is still relatively straightforward mapping between syntactic constructs and something von neumann-ish.

No there isn't. "Something von neumann-ish" would have behaviour for all inputs - maybe not desirable behaviour, maybe even different behaviour on different processor revisions, but it would have behaviour. The C abstract machine doesn't.

adwn3y ago

> Assembly in the macro-assembler sense. Doesn't matter that it's obscure and not a direct mapping to any specific ISA, there is still relatively straightforward mapping between syntactic constructs and something von neumann-ish.

That applies to pretty much all imperative languages that compile directly to assembly/machine code – including Rust.

eptcyka3y ago· 4 in thread

Rust has no explicit goal to introduce magic - instead the goal is to be safe and sound. I'd argue the code very often compiles to what one might expect it to compile to, its certainly better than C++ when it comes to magic in terms of how useful and transparent said magic is.

j-pbOP3y ago

C++ is arguably the most complex programming language ever. There is not much to be gained by comparing against the supremum.

Take a look at the languages that rust was influenced by (https://en.wikipedia.org/wiki/Rust_(programming_language)) those aren't languages with straightforward compilation semantics.

There is a reason why rust has a datalog engine build into the compiler (https://github.com/rust-lang/datafrog). Which is imho totally rad and awesome, but really hard to fully form a mental model of without a spec.

Rusky3y ago

This is pretty vague reasoning. Rust draws inspiration from each of those languages for something but that doesn't mean they affect its operational semantics, which are quite similar to C's.

The fact that you point to datafrog is illustrative: while it is not actually built into the compiler today, the use case for it is borrow checking, which famously does not impact the language's operational behavior at all! It is purely a compile-time analysis that does not influence code generation.

(For example, consider that mrustc and the GCC frontend are both able to omit borrow checking entirely and yet still produce runnable binaries!)

Of course, the borrow checker is still something that you need to build a mental model for, but by design it is okay for you to get that wrong occasionally, because the result can only ever be "your program still does what you thought, but the analysis proved that in a way you did not expect" or "your program does not compile."

pjmlp3y ago

I bet PL/I, Ada, and even Python 3 aren't far off.

Same applies to the Java and .NET ecosystem, because either you swim on the surface, or you really get to know how the implementations, down to bytecode, JIT, GC and standard libraries work, and now they are full speed ahead with 6 month release schedules.

cztomsik3y ago

after 5ys of (hobby but frequent) rust I'm pretty confident most people cannot tell what their code will compile to and what its runtime profile is going to be (aka you can't reason about performance -> so there is a lot of magic/complexity)

I've recently tried Zig and switched to it instantly, it's hard to explain but basically Andrew has a very good taste at picking important features and keeping the language complexity very low.

You know how it takes some time to learn borrow checker and macros and generics and traits and all weird rules of what you cannot do and then trait bounds and then it doesn't work exactly like you need, or the crate does not support something and you cannot implement it yourself, etc. etc.

So in Zig I had a hello world on day one, and the first thing I did was encode/decode ANY json messages for tagged union (which unfortunately is not supported in std but it was very easy to do it myself) and it worked! I did this the first day in entirely new language (I was not even doing C/C++ before) and it would probably take few days in rust and I'd probably mess up something and I know it wouldn't work for every case and every crate, because of orphan rule. In Zig it would work for any struct, internal or external. And also Debug, Display, Eq, Partial, all of that works automatically. That's huge!

And the worst thing is that recently, I've started using pointers again, and when I look at the code I don't see anything unsafe in the structure itself, it can be used unsafely but that is also very easy to fix in Zig because you have these explicit allocator and it's so easy to put everything in the same arena transparently, or use SegmentedList with stable pointers.

kmeisthax3y ago· 2 in thread

C used to be a "cocky macroassembler", up until people started writing optimizing compilers for it.

If your compiler merely translates a source line into a series of assembly mnemonics, function calls, or interpreter gotos, then the interface is the implementation. You can rely on the underlying target language to provide your program with meaning and the only people who have to care are people reimplementing your compiler for compatibility.

The moment you start talking about optimization, then this no longer works. You no longer have a correspondence between source and compiled forms of one program. You have a many-to-many relationship where one source form can be compiled into hundreds of binaries depending on how the compiler is configured, and many source forms may actually optimize to the exact same compiled form. This requires you to provide your own semantics, else compiled programs have no meaning and -O3 becomes shorthand for "make demons fly out my nose".

In the case of C they came up with a series of rules for what-not-to-do that both did not match existing language semantics and also were dangerously incomplete. There are still C programmers who insist that you can free() memory but still touch it for a "little while"[0], or access memory "off the end" of an allocation[1], for example. And ISO C still made the mistake of retaining pointers, which are a confusing mix of value and reference type. They aren't references because you are allowed to cast them to and from integers; and they can't be values because you can use them to modify other values. Because of this tension, we keep discovering new combinations of valid transformations on valid programs that cause miscompiles, and then we have to invent things like pointer provenance to fix them.

As far as I'm concerned, the only difference between Rust and C is that Rust is honest about it's cleverness. C has to pretend to be simple while also out-clevering Rust (or at least, the safe subset of Rust).

[0] Usually in an attempt to emulate automatic memory management. Manual memory management does not work when passing complex structures across an API boundary, and the only options are to either expose custom deallocators (which means no optimizations even when they are sound), tell callers how to deallocate the data (which means no changing the data), or hack the allocator to do what you really want.

[1] It works for malware developers, it should work for me, right?

int_19h3y ago

> they can't be values because you can use them to modify other values

That seems like an arbitrary criterion. They're values because they behave like values when copying them around etc. You can't even use them to modify other values implicitly - you still need to use * to get an actual dereferenceable/assignable lvalue out of the pointer.

kmeisthax3y ago

What operator is used to modify the value doesn't matter, just the fact that you can do so. At the point where you use an assignable lvalue, the optimizer needs to know where that lvalue came from, which means that pointers need to be tracked like references.

My point is that because pointers act like both values and references, they are neither values nor references. This makes it impossible to soundly reason about them.

2 more replies

burntsushi3y ago· 2 in thread

> Rust has the goal of putting as much smartness between what you type and what the compiler produces

No it doesn't. "smartness" is a means, not an end or a goal.

j-pbOP3y ago

> "It really does feel like rustc is a big step toward the mythical Sufficiently Smart Compiler, letting you write nice code without worrying about a suite of low-level gotchas."

> Hear more from Reliability Project Director

https://mobile.twitter.com/rust_foundation/status/1385310806...

burntsushi3y ago

Literally nothing there supports what you said. Don't make shit up.

1 more reply

pcwalton3y ago

You shouldn't be giving Zig (or C, for that matter) a free pass. If your language isn't memory safe, and it compiles to LLVM with the optimizations on, then it's inheriting most of the complexity of Rust's undefined behavior. (I say "most" because I suspect that Zig doesn't use attributes like noalias as much, but there is still a lot of complexity around, for instance, provenance that are orthogonal to those attributes.) It's LLVM, not the language frontend, that actually implements most of the memory model (via lowering to machine instructions).

j / k navigate · click thread line to collapse

0 comments

19 comments · 5 top-level

adwn3y ago· 6 in thread

> The major difference between Rust and [Zig, C] is that the latter are essentially "cocky macro assemblers". They derive their semantics from the semantics of the underlying assembly.

j-pbOP3y ago

Rust simply lacks that mapping, yet.

ralfj3y ago

Rust has that mapping in the same sense that C does: there exists a fairly simple way to implement every language construct (except maybe for dynamic method invocations) in assembly.

2 more replies

formerly_proven3y ago

2 more replies

Diggsey3y ago

This is still wrong though, since UB doesn't exist at an assembly level, and yet C has it. Therefore there is no clear mapping from C to assembly (even considering hypothetical ISAs).

1 more reply

lmm3y ago

> Doesn't matter that it's obscure and not a direct mapping to any specific ISA, there is still relatively straightforward mapping between syntactic constructs and something von neumann-ish.

adwn3y ago

That applies to pretty much all imperative languages that compile directly to assembly/machine code – including Rust.

eptcyka3y ago· 4 in thread

j-pbOP3y ago

C++ is arguably the most complex programming language ever. There is not much to be gained by comparing against the supremum.

Take a look at the languages that rust was influenced by (https://en.wikipedia.org/wiki/Rust_(programming_language)) those aren't languages with straightforward compilation semantics.

Rusky3y ago

This is pretty vague reasoning. Rust draws inspiration from each of those languages for something but that doesn't mean they affect its operational semantics, which are quite similar to C's.

(For example, consider that mrustc and the GCC frontend are both able to omit borrow checking entirely and yet still produce runnable binaries!)

pjmlp3y ago

I bet PL/I, Ada, and even Python 3 aren't far off.

cztomsik3y ago

I've recently tried Zig and switched to it instantly, it's hard to explain but basically Andrew has a very good taste at picking important features and keeping the language complexity very low.

kmeisthax3y ago· 2 in thread

C used to be a "cocky macroassembler", up until people started writing optimizing compilers for it.

[1] It works for malware developers, it should work for me, right?

int_19h3y ago

> they can't be values because you can use them to modify other values

kmeisthax3y ago

My point is that because pointers act like both values and references, they are neither values nor references. This makes it impossible to soundly reason about them.

2 more replies

burntsushi3y ago· 2 in thread

> Rust has the goal of putting as much smartness between what you type and what the compiler produces

No it doesn't. "smartness" is a means, not an end or a goal.

j-pbOP3y ago

> "It really does feel like rustc is a big step toward the mythical Sufficiently Smart Compiler, letting you write nice code without worrying about a suite of low-level gotchas."

> Hear more from Reliability Project Director

https://mobile.twitter.com/rust_foundation/status/1385310806...

burntsushi3y ago

Literally nothing there supports what you said. Don't make shit up.

1 more reply

pcwalton3y ago

j / k navigate · click thread line to collapse