For example, a moved-out-from tree in C++ could represent this by having its inner root pointer be nullptr, and then its dtor would have to check for the root being nullptr, and all its member fns would have the danger of UB (nullptr dereference) if the caller called them on a moved-out shell. But the Rust version could use a non-nullable pointer type (Box), and its dtor and member fns would be guaranteed to act on a valid pointer.
I don't know what nightmares you have. The only requirement that C++ specifies for moved-from objects is that they remain valid. Meaning, they can be safely destroyed.
You can go way out of your way and reuse an object that was just moved, but that's a decision you somehow made, and you have the responsibility of adding your reinitialization or even move logic to get that object back in shape. That is hardly something that sneaks up on you.
The more unfortunate, perhaps disingenuous part is that the proposal paper tries to pretend you can make the destructive move later if you need it once you've got their C++ move.
But actually what they're proposing is that "move + create" + "destroy" = "move". So, that's extra work it's not the same thing at all and sure enough in the real world this means extra work, from compilers, from programmers and sometimes (if it isn't removed by the optimiser) from the runtime program.
delete null is fine in C++ [1], so, assuming root either is a C++ object or a C type without members that point to data that also must be freed, its destructor can do delete root. And those assumptions would hold in ‘normal’ C++ code.
[1] https://en.cppreference.com/w/cpp/language/delete.html: “If ptr is a null pointer value, no destructors are called, and the deallocation function may or may not be called (it's unspecified), but the default deallocation functions are guaranteed to do nothing when passed a null pointer.”
That's partly true, partly circular. Because moves work this way, it's harder to make a class that doesn't have empty states, so I don't design my class to avoid empty states, so the destructor has to handle them.
Note that assuming the into_iter comes from IntoIterator that’s what the for loop invokes to get an iterator from an iterable. So
for lr in LoadRequests.into_iter() {
Is completely unnecessary verbosity, for lr in LoadRequests {
Will do the exact same thing. And the stdlib will generally implement the trait with the relevant semantics on shared and unique references so for lr in LoadRequests.iter_mut()
Can generally be written for lr in &mut LoadRequests
So you rarely need to invoke these methods outside of functional pipelines if you dislike them (some prefer them for clarity / readability).Similar to all the times I defensively str(something) in Python to find that “oh that has __str__ called on it anyways.”
[0]: https://rust-lang.github.io/rust-clippy/master/index.html#ex...
the difference between `const Data& d` and `const Data d` isn't accurately characterized as "a typo" -- it's a semantically significant difference in intent, core to the language, critical to behavior and outcome
even if the author "forgot" to add the `&` due to a typo, that mistake should absolutely have been caught by linting, tests, CI, or code review, well before it entered the code base
so not feelin' it, sorry
it is not productive or interesting to characterize this absolutely core property of the language as "a one char diff" that takes any kind of special cognitive effort to spot
I think the problem with `T &d` and `T d` is that these 2 declarations yield a "name" `d` that you can operate on very similarly. It's not necessarily about reference declaration `T& d` is 1 char diff away compared to value declaration `T d`.
While there is a significant semantic difference between declaring things as a value and as a reference (&), non-static member function invocation syntax is the same on both `&d` and `d`. You can't tell the difference without reading the original declaration, and the compiler will happily accept it.
Contrast this to `T *d` or `T d`. Raw pointers require different operations on `d` (deref, -> operator, etc). You're forced to update the code if you change the declaration because the compiler will loudly complain about it.
It shares the same problem with a type system with nullable-by-default reference type vs an explicit container of [0..1] element Option<T>. Migrating existing code to Option<>-type will cause the compiler to throw a ton of explicit errors, and it will become a breaking change if it was a public API declaration. On the other hand, you're never able to feel safe in nullable-by-default; a public API might claim it never return `null` in the documentation, but you will never know if it's true or not only from the type signature.
Whether it's good or bad, I guess it depends on the language designer's decision. It is certainly more of a hassle to break & fix everything when updating the declaration, but it also can be a silent footgun as well.
And, as you said, this is a meaningful difference in intent, so linting can't just blanket complain on every single instance of a non-&-ed argument.
And the difference in writing down intent is the wrong direction - doing a full nested object clone should require adding code in any sane language, whereas, in C++, making code clone takes.. negative one characters.
Whereas in Rust, the only thing that's ever implicit is a bitwise copy on objects with constant size; everything else requires either adding &-s or .clone()s, or your code won't compile.
I've been writing some Swift code in recent years. The most frequent source of bugs has been making incorrect assumptions on whether a parameter is a class or a struct (reference or value type). C# has the same issue.
It's just a terrible idea to make the value/reference distinction at the type level.
It extends it a bit, too, with `out` meaning that the referenced argument is initialized by the function, not read.
If something is const, whether to pass it by reference or value is a decision the compiler should make. There's a size threshold, and it varies with the target hardware. It might be 2 bytes on an Arduino and 16 bytes on a machine with 128-bit arithmetic. Or even as big as a cache line. That optimization is reportedly made by the Rust compiler. It's an old optimization, first seen in Modula 1, which had strict enough semantics to make it work.
Rust can do this because the strict affine type model prohibits aliasing. So the program can't tell if it got the original or a copy for types that are Copy. C++ does not have strong enough assurances to make that a safe optimization. "-fstrict-aliasing" enables such optimizations, but the language does not actually validate that there is no aliasing.
If you are worried about this, you have either used a profiler to determine that there is a performance problem in a very heavily used inner loop, or you are wasting your time.
Whether an argument is passed in a register or not is unfortunately much more nuanced than this: it depends on the ABI calling conventions (which vary depending on OS as well as CPU architecture). There are some examples where the argument will not be passed in a register despite being "small enough", and some examples where the argument may be split across two or more registers.
For instance, in the x86-64 ELF ABI spec [0], the type needs to be <= 16 bytes (despite registers only being 8 bytes), and it must not have any nontrivial copy / move constructors. And, of course, only some registers are used in this way, and if those are used up, your value params will be passed on the stack regardless.
[0] Section 3.2.3 of https://gitlab.com/x86-psABIs/x86-64-ABI
But it needs to be conservative of course, in general you can't do this.
struct Data {
// Vec cannot implement "Copy" type
data: Vec<i32>,
}
// Equivalent to "passing by const-ref" in C++
fn BusinessLogic(d :&Data) {
d.DoThing();
}
// Equivalent to "move" in C++
fn FactoryFunction(d: Data) -> Owner {
owner = Owner{data: d};
// ...
return owner
}
Is this really true?I believe in Rust, when you move a non-Copy type, like in this case, it is up to the compiler if it passes a reference or makes a physical copy.
In my (admittedly limited) understanding of Rust semantics calling
FactoryFunction(d: Data)
could physically copy d despite it being non-Copy. Is this correct?EDIT:
Thinking about it, the example is probably watertight because d is essentially a Vec (as Ygg2 pointed out).
My point is that if you see
FactoryFunction(d: Data)
and all you know is that d is non-Copy you should not assume it is not physically copied on function call. At least that is my believe.I believe the answer is technically yes. IIRC a "move" in Rust is defined as a bitwise copy of whatever is being moved, modulo optimizations. The only difference is what you can do with the source after - for non-Copy types, the source is no longer considered accessible/usable. With Copy types, the source is still accessible/usable.
On that machine Data is "physically" just the Vec, which is three 64-bit values, a pointer to i32 ("physically" on this machine a virtual address), an integer length and an integer capacity, and the machine has a whole bunch of GPRs so sure, one way the compiler might implement FactoryFuncton is to "physically" copy those three values into CPU registers. Maybe say RAX, RCX, RDX ?
Actually though there's an excellent chance that this gets inlined in your program, and so FactoryFunction never really exists as a distinct function, the compiler just stamps out the appropriate stuff in line every time we "call" this function, so then there was never a "parameter" because there was never a "function".
The point I am trying to make is more general:
I believe that when you have a type in Rust that is not Copy it will never be implicitly copied in a way that you end up with two visible instances but it is not guaranteed that Rust never implicitly memcopies all its bytes.
I have not tried it but what I had in mind instead of the Vec was a big struct that is not Copy. Something like:
struct Big<const M: usize> {
buf: [u8; M],
}
// Make it non-Copy.
impl<const M: usize> Drop for Big<M> {
fn drop(&mut self) {}
}
From my understanding, to know if memory is shoveled around it is not enough to know the function signature and whether the type is Copy or not. The specifics of the type matter.If you have
Vec<i32>
// newtype struct
struct Data{ data: Vec<i32> }
// newtype enum in rust
// Possibly but not 100% sure
// enum OneVar { Data(Vec<i32>) }
From my experiments with newtype pattern, operations implemented on data and newtype struct yielded same assembly. To be fair in my case it wasn't a Vec but a [u8; 64] and a u32.My favourite as a Unix person is Option<OwnedFd>. In a way Option<OwnedFd> is the same as the classic C int file descriptor. It has the exact same representation, 32 bits of aligned integer. But Rust's type system means we know None isn't a file descriptor, whereas it's too easy for the C programmer to forget that -1 isn't a valid file descriptor. Likewise the Rust programmer can't mistakenly do arithmetic on file descriptors, if we intend to count up some file descriptors but instead sum them in C that compiles and isn't what you wanted, in Rust it won't compile.
If people could keep entirety of J.2 appendix in their mind at all time we would not have these issues. And if they had entirety of J appendix in mind all C code would be portable.
Or if people just always ran -Wall -Wpedantic -Wall_for_real_this_time -fsanitize=thread,memory,address,leaks,prayers,hopes,dreams,eldritch_beings,elder_gods -fno-omit-frame-pointer
I mean if this was all it took then C and C++ programs would be as safe as Rust. Which is not what we see in practice. And it's not like C programmers are an average web dev. It's a relatively niche and well versed community.
If you care about performance, you measure it. If you don't measure performance, you don't care about it.
You have to do it correct or you might be just measuring: when your system is pulling updates, how big is your username, the performance of the least critical thing in your app.
And at worst you can speed up your least performing function only to yield a major slowdown to overall performance.
This reminds me of arguing more than once with JS developers about the dangers of loose typing (especially in the case of JS) and getting the inevitable reply ”I just keep track of my type casting.”.
Another problem with C++ references is that they aren't really reference types, they are aliases, so they have wonky semantics and crazy nonsensical features like `const T&` doing lifetime extension
All good linters complain about const buffer data missing the ampersand btw
Regarding const by-value parameters, they should never appear in function declarations (without definition) since that doesn’t enforce anything. In function definitions, you can use const refs (which have lifetime extension) to achieve the same const-correctness, and const refs are better for large types.
Admittedly this further proves the point that c++ is needlessly complicated for users, and I agree with that.
The second refinement is that you can use `const X&` as an optimization of `X`. (Perfectly safe for parameters; somewhat treacherous for return values.) Passing by `X&` without the const, or by `const X` without the ampersand, are both typos, and you should regularly use tooling to find and fix that kind of typo.
https://quuxplusone.github.io/blog/2019/01/03/const-is-a-con...
And that's it, for business-logic code. If you're writing your own resource-management type, you'll need to know about `X(X&&)` and `X& operator=(X&&)`, but ordinary business-logic code never does.
"What about `X&` for out-parameters?" Pass out-parameters by pointer. It's important and helpful to indicate their out-parameter-ness at the call-site, which is exactly what passing by pointer does. (And the pointer value itself will be passed by value, just like in C.)
"What about return by const value, like Scott Meyers recommended 20–30 years ago?" No, don't do that. It disables the ability to move-assign or move-construct from the return value, which means it's a pessimization. Scott found this out, retracted that advice in 2009, and correctly issued the opposite advice in his 2014 book.
https://quuxplusone.github.io/blog/2019/01/03/const-is-a-con...
At work I use a Clang patched with "-Wqual-class-return-type" to report return-by-const-value typos — since, again, `const X getter()` is almost always a typo for `const X& getter()`.
You can use that compiler too: https://godbolt.org/z/7177MTfb8
Exactly, this is not an issue in any reasonable setup because static analysis catches (and fixes!) this reliably.
> but evidently these issues go unnoticed until a customer complains about it or someone actually bothers to profile the code.
No
With Rust I have one job that runs tests and another that runs cargo build --release and I'm done...
Rust: Typo? Now it just doesn't compile anymore. Worst case is that the compiler does a bad job at explaining the error and you don't find it immediately.
C++: Typo? Good luck. Things may now be broken in so subtle and hard to figure out ways it may haunt you till the rest of your days.
But that of course depends on the nature of the typo. Now I should go and read the article.
By the way, the project considers this a bug and accepts reports for that. In many occasions they are easy to fix. In others large refactors are needed. But being aware of the case is the necessary first step to making them better.